I have application that is extracting and finding text in pdf using spire.pdf library.
Extracting and finding text is very slow in some pdf files.
This is the code I used for testing:
- Code: Select all
Stopwatch extractTextStopwatch = new Stopwatch();
Stopwatch findTextStopwatch = new Stopwatch();
PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile(@"C:\test\test.pdf");
PdfTextExtractOptions options = new PdfTextExtractOptions();
options.IsShowHiddenText = true;
foreach (PdfPageBase page in pdf.Pages)
{
extractTextStopwatch.Start();
string pageText = page.ExtractText(options);
extractTextStopwatch.Stop();
PdfTextFindOptions textFindOptions = new PdfTextFindOptions();
textFindOptions.IsShowHiddenText = true;
findTextStopwatch.Start();
PdfTextFindCollection findTextCollection = page.FindText("123", Spire.Pdf.General.Find.TextFindParameter.WholeWord, textFindOptions);
findTextStopwatch.Stop();
}
Console.WriteLine("ExtractText time: " + extractTextStopwatch.ElapsedMilliseconds);
Console.WriteLine("FindText time: " + findTextStopwatch.ElapsedMilliseconds);
Console.Read();
I sent an example of pdf and VisualStudio solution to support@e-iceblue.com.
On my local computer with Windows 10 and 8 GB of RAM memory, using Spire.OfficeFor.NETStandard 7.9.2 library, total times for extracting and finding text are:
ExtractText time: 49393ms
FindText time: 42359ms
Is it possible to optimise this?
Thanks,
Filip