- Code: Select all
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\">........
Below is the code I am using to convert to HTML
- Code: Select all
private static readonly Encoding LocalEncoding = Encoding.UTF8;
private string ConvertDocumentContentToHTML(StructureDocumentTag content)
{
try
{
DocumentObject docObj = content.Clone();
Spire.Doc.Document currComponentDoc = new Spire.Doc.Document();
Section section = currComponentDoc.AddSection();
section.Body.ChildObjects.Add(docObj);
MemoryStream memoryStream = new MemoryStream();
currComponentDoc.SaveToStream(memoryStream, FileFormat.Html);
string htmlText = LocalEncoding.GetString(memoryStream.ToArray());
Console.WriteLine(htmlText);
htmlText = HtmlParser.RemoveDocType(htmlText); //As a temporary workaround I am removing DOCTYPE from the html string please ignore this line
return htmlText;
}
catch (Exception ex)
{
Console.WriteLine($"{nameof(ConvertDocumentContentToHTML)} - Failed - Error : {ex.Message}");
throw ex;
}
}
My concern is XHTML although has similarities with XHTML, it really is not HTML their mime type is different and so are parsing modes and many more differences are there between the two. When I save to stream -
- Code: Select all
doc.SaveToStream(memoryStream, FileFormat.Html);
The second part to this is after converting content control to html when I convert the same html string back to docx using Spire.Doc it adds extra line of text on the word document as follows (also attached screenshot)
- Code: Select all
html xmlns="http://www.w3.org/1999/xhtml">
How do I avoid the above without stripping of <!DOCTYPE> from the html string.
Please note I am using Spire.Doc 10.7.0 and using .NET Core 3.1