Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Tue Oct 11, 2022 12:00 pm

Hi,
I have application that is extracting and finding text in pdf using spire.pdf library.
Extracting and finding text is very slow in some pdf files.

This is the code I used for testing:
Code: Select all
Stopwatch extractTextStopwatch = new Stopwatch();
            Stopwatch findTextStopwatch = new Stopwatch();

            PdfDocument pdf = new PdfDocument();
            pdf.LoadFromFile(@"C:\test\test.pdf");

            PdfTextExtractOptions options = new PdfTextExtractOptions();
            options.IsShowHiddenText = true;

            foreach (PdfPageBase page in pdf.Pages)
            {
                extractTextStopwatch.Start();
                string pageText = page.ExtractText(options);
                extractTextStopwatch.Stop();

                PdfTextFindOptions textFindOptions = new PdfTextFindOptions();
                textFindOptions.IsShowHiddenText = true;
                findTextStopwatch.Start();
                PdfTextFindCollection findTextCollection = page.FindText("123", Spire.Pdf.General.Find.TextFindParameter.WholeWord, textFindOptions);
                findTextStopwatch.Stop();
            }
            Console.WriteLine("ExtractText time: " + extractTextStopwatch.ElapsedMilliseconds);
            Console.WriteLine("FindText time: " + findTextStopwatch.ElapsedMilliseconds);

            Console.Read();


I sent an example of pdf and VisualStudio solution to support@e-iceblue.com.

On my local computer with Windows 10 and 8 GB of RAM memory, using Spire.OfficeFor.NETStandard 7.9.2 library, total times for extracting and finding text are:
ExtractText time: 49393ms
FindText time: 42359ms

Is it possible to optimise this?

Thanks,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Wed Oct 12, 2022 7:50 am

Hello,

Thanks for your inquiry.
After I tested Spire.OfficeFor.NET Standard 7.9.2, it cost about 23s on my system(Win10 X64 RAM:16GB). But when I tested it again using Spire.PDF 8.9.16, it can complete the process very quickly. In any case, I have logged the issue of "Spire.OfficeFor.NET Standard 7.9.2" into our tracking system with the ticket SPIREPDF-5542. Our Dev team will do more investigations. If your side can use Spire.PDF to test firstly?
https://www.nuget.org/packages/Spire.PDF/8.9.16

Sincerely,
Simple
E-iceblue support team
User avatar

Simple.Li
 
Posts: 248
Joined: Fri Jul 01, 2022 2:33 am

Wed Oct 12, 2022 8:00 am

Hi,
I tried using Spire.PDF 8.9.16 library for testing and it works fine:
ExtractText time: 1108ms
FindText time: 882ms

But, my application is running on Ubuntu, so I need to use Spire.OfficeFor.NETStandard library.

Best regards,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Wed Oct 12, 2022 8:45 am

Hello,

Thanks for your feedback.
You project is based on net6.0 for Ubuntu, right? Once there is any update of SPIREPDF-5542, I will inform you asap. Thank you in advance for your patience.

Sincerely,
Simple
E-iceblue support team
User avatar

Simple.Li
 
Posts: 248
Joined: Fri Jul 01, 2022 2:33 am

Wed Oct 12, 2022 9:30 am

Hi,
No, my project is based on .NET Core 2.2.

Best regards,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Thu Oct 13, 2022 10:05 am

Hello,

Thanks for your feedback.
As for the .NET Core 2.2 project, you also can install our Spire.PDF 8.9.16 via Nuget. The .NETCoreApp 2.0 folder dlls of this product package will be used automatically, and it has the same conversion performance on Ubuntu system. If you have any further questions, please feel free to contact us.

Sincerely,
Simple
E-iceblue support team
User avatar

Simple.Li
 
Posts: 248
Joined: Fri Jul 01, 2022 2:33 am

Thu Oct 13, 2022 11:13 am

Hi,
I need to use Spire.Officefor.NETStandard library because of some other issue described in extracting-text-from-pdf-does-not-work-on-ubuntu-t11365.html.
Best regards,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Fri Oct 14, 2022 3:17 am

Hello,

Thanks for your feedback.
As shown in this website tutorial, Microsoft made a statement that the “System.Drawing.Common”dependency is no longer supported on non-windows systems from .NET6.0 onwards. But your project is based on .NET Core 2.2, so you can run it directly on Ubuntu using Spire.PDF 8.9.16 normally. If you have any further questions, please feel free to contact us.
https://learn.microsoft.com/en-us/dotne ... ndows-only

Sincerely,
Simple
E-iceblue support team
User avatar

Simple.Li
 
Posts: 248
Joined: Fri Jul 01, 2022 2:33 am

Mon Oct 31, 2022 9:29 am

Hi,
I tried to switch from Spire.OfficeFor.NETStandard 7.9.2 to separate Spire.PDF, Spire.Doc, Spire.XLS libraries (latest versions from nuget).

I have issue, following code in .NET Core 2.2 project when using latest versions of Spire.Doc 10.10.4 and Spire.PDF 8.10.5 references from nuget:
Code: Select all
Spire.Doc.Document document = new Spire.Doc.Document();


throws exception:
System.TypeLoadException: "Could not load type 'spr㕴' from assembly 'Spire.Pdf, Version=8.10.5.0, Culture=neutral, PublicKeyToken=663f351905198cb3'."

So, in the same project I have complex logic that uses multiple libraries that Spire offers, that is the reason I used Spire.OfficeFor.NETStandard.
Perhaps it is easier to just fix issue SPIREPDF-5542.

Best regards,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Mon Oct 31, 2022 10:06 am

Hello,

Thanks for your inquiry.
Your error was caused that you referenced our independent products(Spire.Doc, Spire.PDF,Spire.XLS) in the same project. This caused the dlls incompatibility. In your case, you need to first delete all our product DLLS from the project, and then install the latest Version of Spire.Office Platinum(Hotfix) Version:7.10.0 to avoid the incompatibility. If there are still problems after testing, please feel free to contact us.

Sincerely,
Simple
E-iceblue support team
User avatar

Simple.Li
 
Posts: 248
Joined: Fri Jul 01, 2022 2:33 am

Mon Oct 31, 2022 10:23 am

Hi,
I can't use that package because of the issue described in extracting-text-from-pdf-does-not-work-on-ubuntu-t11365.html
Best regards,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Tue Nov 01, 2022 7:13 am

Hello,

Thanks for your inquiry.
As our previous replied that Microsoft made a statement that the “System.Drawing.Common”dependency is no longer supported on non-windows systems from .NET6.0 onwards. You earlier informed that your project is based on .NET Core 2.2, so you can directly install Spire.Office Platinum(Hotfix) Version:7.10.0 into your project. We have provided you with an actionable project which used Spire.Office Platinum(Hotfix) Version:7.10.0 on .NET Core 2.2, you can directly test it on your Ubuntu system. If you have any further questions, please feel free to contact us.
https://www.e-iceblue.com/downloads/demo/31078Demo.zip

Sincerely,
Simple
E-iceblue support team
User avatar

Simple.Li
 
Posts: 248
Joined: Fri Jul 01, 2022 2:33 am

Thu Jan 19, 2023 2:33 pm

Simple.Li wrote:Hello,

Thanks for your feedback.
You project is based on net6.0 for Ubuntu, right? Once there is any update of SPIREPDF-5542, I will inform you asap. Thank you in advance for your patience.

Sincerely,
Simple
E-iceblue support team



Hi,

I haven't tried any of your previous proposals because this project is now based on .NET 6.0 for Ubuntu,
So, is issue SPIREPDF-5542 fixed?

Thanks,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Fri Jan 20, 2023 1:26 am

Hello,

Thanks for your following-up.
The issue with the number SPIREPDF-5542 has not been solved currently due to the complexity of it. However, I have urged our development team to speed up fixing your issue. Once the issue has been solved, I’ll inform you in time.

Sincerely
Abel
E-iceblue support team
User avatar

Abel.He
 
Posts: 951
Joined: Tue Mar 08, 2022 2:02 am

Mon Feb 20, 2023 9:31 am

Hi,
Any progress regarding this issue?

Thanks,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Return to Spire.PDF