using spire to convert doc to html

Tue Jan 17, 2023 1:53 pm

Hi i am using spire to convert doc to html, i see below options

1. SaveToFile: This generate, Folder with images, css file, html. This we can not use, as we are trying to do this with azure func and local file generation is not supported here.
2. SaveToStream: This generate stream with html code. Here we are facing below issue ("doc1.html")
-> link to image is broken
-> Header and footer is missing from converted html files in some cases, i have attached the (doc1.docx) word file for which it is not getting generated.
-> unable to process doc1.doc file, getting error " System.InvalidOperationException: This is not a structured storage file." File name, doc1.doc is attached in this post.

Please find the below code using for this purpose. Here, we are looking to generate proper blob html stream which will display image properly.

string cnn = "xxxxxxxx";
string containerName = "xxxxxxxxx";
BlobServiceClient blobServiceClient = new BlobServiceClient(cnn);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(containerName);
var blobs = containerClient.GetBlobs();
foreach (BlobItem blobItem in blobs)
{
//convert .docx files
if (blobItem.Name.Contains(".docx") || blobItem.Name.Contains(".doc"))
{
MemoryStream html = new MemoryStream();
BlobClient blobClient = containerClient.GetBlobClient(blobItem.Name);
if (blobClient.ExistsAsync().Result)
{
using (var docx = new MemoryStream())
{
blobClient.DownloadTo(docx);
//return ms.ToArray();
using (MemoryStream stream = new MemoryStream(docx.ToArray()))
{
Document spireDoc = new Document();
if (blobItem.Name.ToLower().Contains(".docx"))
{
spireDoc.LoadFromStream(stream, FileFormat.Docx2013);
}
else
{
spireDoc.LoadFromStream(stream, FileFormat.Doc);
}
//the new docx without image
spireDoc.SaveToStream(html, FileFormat.Html);
html.Position = 0;
blobClient = containerClient.GetBlobClient(
blobItem.Name.ToLower().Contains(".docx") ?
blobItem.Name.Replace(".docx", ".html") : blobItem.Name.Replace(".doc", ".html"));
await blobClient.UploadAsync(html, true);
}
}
}
}

}

Wed Jan 18, 2023 2:06 am

Hello,

Thanks for your inquiry.
I didn’t find the word document (doc1.docx) in attachment, could you please offer again your document? You can attach here or send it to us via email ([email protected]). Thanks for your assistance in advance.
In addition, please offer the following message to help us reproduce your issue and work out a solution for you.
1) Application type, such as Console App, .NET Framework 4.8.

Sincerely
Abel
E-iceblue support team

Wed Jan 18, 2023 11:55 am

Please find the attached document in zip it contain 3 files,

doc1.doc = unable to convert this doc in html.
doc1.html = html version of doc1.docx

Project Details
Project Type = Class Library.
.Net Framework version = .Net 6.0
Spire Version, spire.Doc = 11.1.0

Thu Jan 19, 2023 10:08 am

Hello,

Thanks for your feedback.
I created a azure functions project and put it to Azure to test your scenario, but I didn’t reproduce your issue, I can get result html file including the image when I test the “doc1.docx”, I attached the result html file for your reference. In addition, I also can get the html file when I test “doc1.doc”. I put my project to our server, please download it according to the following link.
https://www.e-iceblue.com/downloads/att ... l31920.zip

Sincerely
Abel
E-iceblue support team

Thu Jan 19, 2023 12:52 pm

With your code as well "header & footer" is missing in converted html files, can you verify from your end and share the converted html files.

I am getting the file from blob as a stream and when i try to use this stream, it does not produced the correct html file "images are missing", can you try this scenario. The code which you have shared you are reading the files from folder.

For me, below code is breaking with the error message while reading .doc file
spireDoc.LoadFromStream(stream, FileFormat.Doc);

Error Message, unable to process doc1.doc file, getting error " System.InvalidOperationException: This is not a structured storage file." File name, doc1.doc is attached in this post.

Note - to replicate this scenario, please read the file using blob and LoadFromStream() method.

Fri Jan 20, 2023 9:52 am

Hello,

Thanks for your feedback.
After testing, I have reproduced your issue and reported to our development team. They will investigate and fix it. Once there are any updates, I'll inform you in time.

Sincerely
Abel
E-iceblue support team

Sat Jan 28, 2023 8:11 am

Hello,

Greeting from E-iceblue.
I have some information to inform you:
For the issue of header and footer, the definition of header and footer doesn’t exist in HTML, and the feader and footer will disapper after converting Word to Html using MS Word. Our Spire.Doc follows MS Word specifications, therefore, this issue is not a bug.

In addition, for issue of the image disappear in result Html file, you need to add the following code to embed image to HTML file

Code: Select all: spireDoc.HtmlExportOptions.ImageEmbedded = true;

I put the complete code below for your reference.

Code: Select all: foreach (BlobItem blobItem in blobs) { //convert .docx files if (blobItem.Name.Contains(".docx") || blobItem.Name.Contains(".doc")) { MemoryStream html = new MemoryStream(); BlobClient blobClient = containerClient.GetBlobClient(blobItem.Name); if (blobClient.ExistsAsync().Result) { using (var docx = new MemoryStream()) { blobClient.DownloadTo(docx); //return ms.ToArray(); using (MemoryStream stream = new MemoryStream(docx.ToArray())) { Document spireDoc = new Document(); if (blobItem.Name.ToLower().Contains(".docx")) { spireDoc.LoadFromStream(stream, FileFormat.Docx2013); } else { spireDoc.LoadFromStream(stream, FileFormat.Doc); } //Embed image to html file spireDoc.HtmlExportOptions.ImageEmbedded = true; //the new docx without image spireDoc.SaveToStream(html, FileFormat.Html); html.Position = 0; blobClient = containerClient.GetBlobClient( blobItem.Name.ToLower().Contains(".docx") ? blobItem.Name.Replace(".docx", ".html") : blobItem.Name.Replace(".doc", ".html")); await blobClient.UploadAsync(html, true); } } } } }

Finally, for the issue of throw exception when converting doc1.doc to Html, our Dev team is working on it. Once there are any updates, I’ll inform you.

Sincerely
Abel
E-iceblue support team

Sun Jan 29, 2023 5:50 am

Hello,

Greeting from e-iceblue.
for the issue of throw exception when converting doc1.doc to Html, I have some information to inform you:
According to the feedback from our Dev team, to avoid this issue, you can set FileFormat to “Auto” instead of setting FileFormat to “Docx2013” or “Doc”. Please refer to the following code.
If you have any issue, just feel free to contact us.

Code: Select all: Document spireDoc = new Document(); if (blobItem.Name.ToLower().Contains(".docx")) { spireDoc.LoadFromStream(stream, FileFormat.Auto); } else { spireDoc.LoadFromStream(stream, FileFormat.Auto); }

Sincerely
Abel
E-iceblue support team

Wed Feb 01, 2023 10:51 am

Thanks for the update.

Fri Feb 03, 2023 1:23 am

You're welcome! Have you a nice day!

Sincerely
Abel
E-iceblue support team