Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files from any .NET platform (C#, VB.NET, ASP.NET, .NET Core) and Java applications (J2SE and J2EE) with fast and high quality performance.

Tue Jul 05, 2022 11:01 pm

Hi,

I have html content that has data tags, is there a way to preserve the data tags when onverting html to word
Eg: <span data-type="some type" data-id="1234">Some Name</span>

Is there a way to preserve the definition of the html as is without losing anything on the html tags?
For ex: my html string was : "<div data-wrapper=\"true\" style=\"font-size:9pt;font-family:'Segoe UI','Helvetica Neue',sans-serif;\"><div>Test</div></div>"
But when I converted html to word and then converted word back to html I got following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<title></title>
<style type="text/css">
body {
font-family: 'Times New Roman';
font-size: 1em;
}

ul,
ol {
margin-top: 0;
margin-bottom: 0;
}

.Normal {
page-break-inside: auto;
page-break-after: auto;
page-break-before: auto;
margin-top: 0pt;
margin-bottom: 0pt;
margin-left: 0pt;
text-indent: 0pt;
border-top-style: none;
border-left-style: none;
border-right-style: none;
border-bottom-style: none;
font-size: 12pt;
font-family: 'Times New Roman';
mso-fareast-font-family: 'Times New Roman';
mso-bidi-font-family: 'Times New Roman';
lang: EN-US;
mso-fareast-language: EN-US;
mso-ansi-language: AR-SA;
}
</style>
</head>

<body style="pagewidth:595.35pt;pageheight:841.95pt;">
<div class="Section0">
<div style="min-height:20pt" />
<p class="Normal">Test
</p>
</div>
</body>

</html>

My scenario is that I have to export html to word, then word document can be edited which is imported and converted back to html. In this process, I have the original html string preserved and I want to know if there was a change in document (without using track changes) by checking if there is a difference in new and original html.

jalpaashara
 
Posts: 10
Joined: Wed Jun 22, 2022 8:12 pm

Wed Jul 06, 2022 9:09 am

Hi,

Thank you for your inquiry.
Kindly note the HTML frame is different from the Word frame. When converting Word to a new html, its tags will not keep exactly the same as the original html code since our Spire.Doc follows the rules of Microsoft Word. You can do the same operations with Microsoft Word (open html and save to a new html), you will find that you still can't get the same html tags. Please feel free to contact if you have any questions.

Sincerely,
Kylie
E-iceblue support team
User avatar

kylie.tian
 
Posts: 388
Joined: Mon Mar 07, 2022 2:30 am

Return to Spire.Doc