Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Sun Nov 29, 2020 1:20 pm

Hi support,

I'm testing your product (FreeSpire.Doc nuget package) to see if it is a tool that we want to purchase.

The use case is to download any URL (file type is not known e.g. see attachment) and use Aspire to:

1) Autodetect file type (could be any e.g. pdf, doc/x, xls/x etc.)

2) Convert file (contents) to plain text

I have included my source code below but the problem is that I get an exception when trying to load the file via the Url:

"this is not a structured storage file"

Source code:

HttpClient client = new HttpClient();
Document document = new Document();

using (var file = await client.GetStreamAsync(url).ConfigureAwait(false)) // E.g. url =
url example.png

using (var memoryStream = new MemoryStream())
{
await file.CopyToAsync(memoryStream);

document.LoadFromStream(memoryStream, FileFormat.Auto);
}

// Get type
FileFormat fileExtension = document.DetectedFormatType;

// Save source to Text using a stream
MemoryStream streamResult = new MemoryStream();

// Populate stream
document.SaveToStream(streamResult, FileFormat.Txt);

// Get text string
string textContent = Encoding.ASCII.GetString(streamResult.ToArray()); ;


Thank you in advance for your reply.

martingp
 
Posts: 4
Joined: Sun Nov 29, 2020 12:56 pm

Mon Nov 30, 2020 10:07 am

Hello,

Thanks for your inquiry and sorry for the late reply as weekend.

Kindly note that different file formats require different products to handle. For example, Spire.Doc product supports processing Word documents such as doc/docx; Spire.Xls product supports processing Excel documents such as xls/xlsx; Spire.Pdf product supports processing Pdf documents and Spire.Presentation product supports processing PowerPoint documents such as ppt/pptx.

For these reasons, I am sorry that we couldn’t implement your first requirement ” Autodetect file type (could be any e.g. pdf, doc/x, xls/x etc.)”. I recommend that you first determine the format of the stream converted from url, and then load the stream with our corresponding product and convert it to plain text.

For example, we can get the pdf stream from the url you provided (https://eapb.eu/component/attachments/a ... ml?id=1493), so please load it with our Spire.pdf and then convert it to plain text. Please refer to the following code.

Code: Select all
using System;
using System.Text;
using System.Threading.Tasks;
using Spire.Pdf;
using System.IO;
using System.Net.Http;

namespace test
{
    class Program
    {

        static void Main(string[] args)
        {
            PdfDocument pdf = GetAsync().Result;
            StringBuilder content = new StringBuilder();
            foreach (PdfPageBase page in pdf.Pages)
            {
                content.Append(page.ExtractText());
            }
            File.WriteAllText("res.txt", content.ToString());
        }
        async static Task<PdfDocument> GetAsync()
        {

                String url = "https://eapb.eu/component/attachments/attachments.html?id=1493";
                HttpClient client = new HttpClient();
                PdfDocument pdf = new PdfDocument();
                using (var file = await client.GetStreamAsync(url).ConfigureAwait(false))
                using (var memoryStream = new MemoryStream())
                {
                    await file.CopyToAsync(memoryStream);
                    pdf.LoadFromStream(memoryStream);
                }
               return pdf;
         }
     }
}

If you have any other problems, please free to contact us.

Sincerely,
Marcia
E-iceblue support team
User avatar

Marcia.Zhou
 
Posts: 858
Joined: Wed Nov 04, 2020 2:29 am

Wed Dec 09, 2020 10:10 am

Hello,

Hope you are doing well.

Has your issue been solved now? Could you please give us some feedback at your convenience?

Thanks in advance.

Sincerely,
Marcia
E-iceblue support team
User avatar

Marcia.Zhou
 
Posts: 858
Joined: Wed Nov 04, 2020 2:29 am

Return to Spire.Doc