Slow spire.pdf

Tue Oct 11, 2022 12:00 pm

Hi,
I have application that is extracting and finding text in pdf using spire.pdf library.
Extracting and finding text is very slow in some pdf files.

This is the code I used for testing:

Code: Select all: Stopwatch extractTextStopwatch = new Stopwatch(); Stopwatch findTextStopwatch = new Stopwatch(); PdfDocument pdf = new PdfDocument(); pdf.LoadFromFile(@"C:\test\test.pdf"); PdfTextExtractOptions options = new PdfTextExtractOptions(); options.IsShowHiddenText = true; foreach (PdfPageBase page in pdf.Pages) { extractTextStopwatch.Start(); string pageText = page.ExtractText(options); extractTextStopwatch.Stop(); PdfTextFindOptions textFindOptions = new PdfTextFindOptions(); textFindOptions.IsShowHiddenText = true; findTextStopwatch.Start(); PdfTextFindCollection findTextCollection = page.FindText("123", Spire.Pdf.General.Find.TextFindParameter.WholeWord, textFindOptions); findTextStopwatch.Stop(); } Console.WriteLine("ExtractText time: " + extractTextStopwatch.ElapsedMilliseconds); Console.WriteLine("FindText time: " + findTextStopwatch.ElapsedMilliseconds); Console.Read();

I sent an example of pdf and VisualStudio solution to support@e-iceblue.com.

On my local computer with Windows 10 and 8 GB of RAM memory, using Spire.OfficeFor.NETStandard 7.9.2 library, total times for extracting and finding text are:
ExtractText time: 49393ms
FindText time: 42359ms

Is it possible to optimise this?

Thanks,
Filip

Wed Oct 12, 2022 7:50 am

Hello,

Thanks for your inquiry.
After I tested Spire.OfficeFor.NET Standard 7.9.2, it cost about 23s on my system(Win10 X64 RAM:16GB). But when I tested it again using Spire.PDF 8.9.16, it can complete the process very quickly. In any case, I have logged the issue of "Spire.OfficeFor.NET Standard 7.9.2" into our tracking system with the ticket SPIREPDF-5542. Our Dev team will do more investigations. If your side can use Spire.PDF to test firstly?
https://www.nuget.org/packages/Spire.PDF/8.9.16

Sincerely,
Simple
E-iceblue support team

Wed Oct 12, 2022 8:00 am

Hi,
I tried using Spire.PDF 8.9.16 library for testing and it works fine:
ExtractText time: 1108ms
FindText time: 882ms

But, my application is running on Ubuntu, so I need to use Spire.OfficeFor.NETStandard library.

Best regards,
Filip

Wed Oct 12, 2022 8:45 am

Hello,

Thanks for your feedback.
You project is based on net6.0 for Ubuntu, right? Once there is any update of SPIREPDF-5542, I will inform you asap. Thank you in advance for your patience.

Sincerely,
Simple
E-iceblue support team

Wed Oct 12, 2022 9:30 am

Hi,
No, my project is based on .NET Core 2.2.

Best regards,
Filip

Thu Oct 13, 2022 10:05 am

Hello,

Thanks for your feedback.
As for the .NET Core 2.2 project, you also can install our Spire.PDF 8.9.16 via Nuget. The .NETCoreApp 2.0 folder dlls of this product package will be used automatically, and it has the same conversion performance on Ubuntu system. If you have any further questions, please feel free to contact us.

Sincerely,
Simple
E-iceblue support team

Thu Oct 13, 2022 11:13 am

Hi,
I need to use Spire.Officefor.NETStandard library because of some other issue described in extracting-text-from-pdf-does-not-work-on-ubuntu-t11365.html.
Best regards,
Filip

Fri Oct 14, 2022 3:17 am

Hello,

Thanks for your feedback.
As shown in this website tutorial, Microsoft made a statement that the “System.Drawing.Common”dependency is no longer supported on non-windows systems from .NET6.0 onwards. But your project is based on .NET Core 2.2, so you can run it directly on Ubuntu using Spire.PDF 8.9.16 normally. If you have any further questions, please feel free to contact us.
https://learn.microsoft.com/en-us/dotne ... ndows-only

Sincerely,
Simple
E-iceblue support team

Mon Oct 31, 2022 9:29 am

Hi,
I tried to switch from Spire.OfficeFor.NETStandard 7.9.2 to separate Spire.PDF, Spire.Doc, Spire.XLS libraries (latest versions from nuget).

I have issue, following code in .NET Core 2.2 project when using latest versions of Spire.Doc 10.10.4 and Spire.PDF 8.10.5 references from nuget:

Code: Select all: Spire.Doc.Document document = new Spire.Doc.Document();

throws exception:
System.TypeLoadException: "Could not load type 'spr㕴' from assembly 'Spire.Pdf, Version=8.10.5.0, Culture=neutral, PublicKeyToken=663f351905198cb3'."

So, in the same project I have complex logic that uses multiple libraries that Spire offers, that is the reason I used Spire.OfficeFor.NETStandard.
Perhaps it is easier to just fix issue SPIREPDF-5542.

Best regards,
Filip

Mon Oct 31, 2022 10:06 am

Hello,

Thanks for your inquiry.
Your error was caused that you referenced our independent products(Spire.Doc, Spire.PDF,Spire.XLS) in the same project. This caused the dlls incompatibility. In your case, you need to first delete all our product DLLS from the project, and then install the latest Version of Spire.Office Platinum(Hotfix) Version:7.10.0 to avoid the incompatibility. If there are still problems after testing, please feel free to contact us.

Sincerely,
Simple
E-iceblue support team

Mon Oct 31, 2022 10:23 am

Hi,
I can't use that package because of the issue described in extracting-text-from-pdf-does-not-work-on-ubuntu-t11365.html
Best regards,
Filip

Tue Nov 01, 2022 7:13 am

Hello,

Thanks for your inquiry.
As our previous replied that Microsoft made a statement that the “System.Drawing.Common”dependency is no longer supported on non-windows systems from .NET6.0 onwards. You earlier informed that your project is based on .NET Core 2.2, so you can directly install Spire.Office Platinum(Hotfix) Version:7.10.0 into your project. We have provided you with an actionable project which used Spire.Office Platinum(Hotfix) Version:7.10.0 on .NET Core 2.2, you can directly test it on your Ubuntu system. If you have any further questions, please feel free to contact us.
https://www.e-iceblue.com/downloads/demo/31078Demo.zip

Sincerely,
Simple
E-iceblue support team

Thu Jan 19, 2023 2:33 pm

Simple.Li wrote:Hello,

Thanks for your feedback.
You project is based on net6.0 for Ubuntu, right? Once there is any update of SPIREPDF-5542, I will inform you asap. Thank you in advance for your patience.

Sincerely,
Simple
E-iceblue support team

Hi,

I haven't tried any of your previous proposals because this project is now based on .NET 6.0 for Ubuntu,
So, is issue SPIREPDF-5542 fixed?

Thanks,
Filip

Fri Jan 20, 2023 1:26 am

Hello,

Thanks for your following-up.
The issue with the number SPIREPDF-5542 has not been solved currently due to the complexity of it. However, I have urged our development team to speed up fixing your issue. Once the issue has been solved, I’ll inform you in time.

Sincerely
Abel
E-iceblue support team

Mon Feb 20, 2023 9:31 am

Hi,
Any progress regarding this issue?

Thanks,
Filip

Slow spire.pdf

Purchase

Partnership

Products

Corporation