Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Tue Jul 05, 2022 11:01 pm

Hi,

I have html content that has data tags, is there a way to preserve the data tags when onverting html to word
Eg: <span data-type="some type" data-id="1234">Some Name</span>

Is there a way to preserve the definition of the html as is without losing anything on the html tags?
For ex: my html string was : "<div data-wrapper=\"true\" style=\"font-size:9pt;font-family:'Segoe UI','Helvetica Neue',sans-serif;\"><div>Test</div></div>"
But when I converted html to word and then converted word back to html I got following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<title></title>
<style type="text/css">
body {
font-family: 'Times New Roman';
font-size: 1em;
}

ul,
ol {
margin-top: 0;
margin-bottom: 0;
}

.Normal {
page-break-inside: auto;
page-break-after: auto;
page-break-before: auto;
margin-top: 0pt;
margin-bottom: 0pt;
margin-left: 0pt;
text-indent: 0pt;
border-top-style: none;
border-left-style: none;
border-right-style: none;
border-bottom-style: none;
font-size: 12pt;
font-family: 'Times New Roman';
mso-fareast-font-family: 'Times New Roman';
mso-bidi-font-family: 'Times New Roman';
lang: EN-US;
mso-fareast-language: EN-US;
mso-ansi-language: AR-SA;
}
</style>
</head>

<body style="pagewidth:595.35pt;pageheight:841.95pt;">
<div class="Section0">
<div style="min-height:20pt" />
<p class="Normal">Test
</p>
</div>
</body>

</html>

My scenario is that I have to export html to word, then word document can be edited which is imported and converted back to html. In this process, I have the original html string preserved and I want to know if there was a change in document (without using track changes) by checking if there is a difference in new and original html.

jalpaashara
 
Posts: 21
Joined: Wed Jun 22, 2022 8:12 pm

Wed Jul 06, 2022 9:09 am

Hi,

Thank you for your inquiry.
Kindly note the HTML frame is different from the Word frame. When converting Word to a new html, its tags will not keep exactly the same as the original html code since our Spire.Doc follows the rules of Microsoft Word. You can do the same operations with Microsoft Word (open html and save to a new html), you will find that you still can't get the same html tags. Please feel free to contact if you have any questions.

Sincerely,
Kylie
E-iceblue support team
User avatar

kylie.tian
 
Posts: 412
Joined: Mon Mar 07, 2022 2:30 am

Thu Feb 02, 2023 10:10 am

Hello Spire.Doc team,

When we save a Word document content to "Web Page" format, as you rightly pointed out, it saves the content with a huge set of WORD styling tags making the document a lot bulkier. However if we save the same WORD document content to "Web Page, Filtered" format, the size of the .htm document is almost 6 times smaller than that generated for the normal "Web Page" format.

Can you point us to a Spire.Doc API that can help us convert Word content to "Web Page, Filtered" format?

WebPageFilteredFormat.jpg


Thanks,
Prad

pradhumna
 
Posts: 2
Joined: Thu Jun 30, 2022 5:02 am

Fri Feb 03, 2023 5:52 am

Hi,

Thanks for your inquiry.
Converting Word document to “Web Page, Filtered” format is not supported by Spire.Doc at present, but I have added it to our product upgrading system with the ticket number SPIREDOC-9024, once it is implemented, I will inform you immediately.

Sincerely,
Triste
E-iceblue support team
User avatar

Triste.Dai
 
Posts: 999
Joined: Tue Nov 15, 2022 3:59 am

Fri Feb 03, 2023 8:02 am

Thanks much team. We look forward to receiving this product update. Here's some more info that is driving the need for an optimized HTML conversion: When our original HTML text gets converted to WORD text, the text size is increased to over 10-12 times that of the original content. Therefore, when we need to convert this WORD text back to HTML to store in our source system, our system rejects the content due to the size limitations. Moreover, this huge volume of content results in excessive processing times which eventually leads to an undesirable end-user experience.

From your latest post it is understood that you will provide an API to save the WORD content to Web Filtered format which will produce size-optimized HTML content. A better solution would be to provide an API that will produce clean HTML content that does not contain any WORD styling. The size-optimization that will be achieved in this API will be even better than the Web Page Filtered HTML format. Thank you for a great support experience.

pradhumna
 
Posts: 2
Joined: Thu Jun 30, 2022 5:02 am

Fri Feb 03, 2023 10:04 am

Hi,

Thanks for your feedback.
I have recorded your suggestion into our system, our developers will do an evaluation, once it is implemented, I will keep you informed. Thanks for your suggestion!

Sincerely,
Triste
E-iceblue support team
User avatar

Triste.Dai
 
Posts: 999
Joined: Tue Nov 15, 2022 3:59 am

Return to Spire.Doc

cron