Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Mon Jul 04, 2022 8:31 am

Hi,
Spire.pdf library does not recognise text in attached file using method ExtractText().
Can you check?

This is entire action in ASP.NET that works on other pdfs:
Code: Select all
 public ActionResult Post([FromQuery] IFormFile file)
        {
            if (file == null || file.Length == 0)
            {
                return NoContent();
            }

            byte[] fileBytes = null;
            using (var ms = new MemoryStream())
            {
                file.CopyTo(ms);
                fileBytes = ms.ToArray();
            }

            StringBuilder content = new StringBuilder();
            PdfDocument document = new PdfDocument();
            if (fileBytes != null)
            {
                document.LoadFromBytes(fileBytes);

                foreach (PdfPageBase page in document.Pages)
                {
                    string pageText = page.ExtractText();
                    if (!string.IsNullOrEmpty(pageText) && !pageText.Equals(Environment.NewLine))
                    {
                        content.Append(pageText);
                    }
                }
            }


            return Ok(content.ToString());
        }


Best regards,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Mon Jul 04, 2022 9:38 am

Hi Filip,

Thanks for your inquiry.

Please kindly note that the text in this PDF is hidden. Therefore they cannot be extracted directly. Please refer to the code below and add parameters to extract the hidden text when using the ExtractText method.
Code: Select all
             PdfDocument pdf = new PdfDocument();
             pdf.LoadFromFile("29.pdf");
             PdfTextExtractOptions options = new PdfTextExtractOptions();
             options.IsShowHiddenText = true;
             string text=pdf.Pages[0].ExtractText(options);
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Thu Jul 07, 2022 12:44 pm

Hi Andy,
That helped.

I have another problem I can't find solution for that is related to this.
I need to find hidden text on page and draw rectangle around it.
How do I find "15/06/2022" text on attached pdf and draw rectangle around it?

This is the code that works for text that is not hidden,
but it does not work for hidden text because FindText method fails to find that hidden text:
Code: Select all
PdfDocument pdf = new PdfDocument();
                pdf.LoadFromFile(inputPath);
                PdfTextFindCollection findTextCollection = pdf.Pages[0].FindText("15/06/2022", TextFindParameter.WholeWord);
                RectangleF rectangle = findTextCollection.Finds[0].TextBounds[0];
                AdjustRectangleSizeAndPosition(pdf.Pages[0], ref rectangle);
                PdfGraphicsState state = pdf.Pages[0].Canvas.Save();
                PdfPen pen = new PdfPen(Color.FromArgb(1, 0, 204, 102), 0.7f);
                pdf.Pages[0].Canvas.DrawRectangle(pen, rectangle);
                pdf.Pages[0].Canvas.Restore(state);
                pdf.SaveToFile(outputPath);


Best regards,
Filip

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Fri Jul 08, 2022 7:29 am

Hi Filip,

I did reproduce this issue. Through preliminary analysis, this may be caused by a character mapping problem. I have reported this issue to the Dev team for further investigation and fix. The issue ticket is SPIREPDF-5333. Sorry for the inconvenience caused.
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Mon Aug 01, 2022 9:11 am

Hi Filip,

Thanks for your patience!

Glad to inform you that we just released Spire.Office Version:7.7.6 which fixes your issue SPIREPDF-5333.

Please download the fix version from the following links to test.
Website link:
Code: Select all
https://www.e-iceblue.com/Download/download-office-for-net-now.html

Nuget:
Code: Select all
https://www.nuget.org/packages/Spire.Office/7.7.6
https://www.nuget.org/packages/Spire.Officefor.NETStandard/7.7.6
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Thu Aug 25, 2022 2:29 pm

Thanks, it works with new version of Spire.Officefor.NETStandard package!

filipfilipfilip
 
Posts: 45
Joined: Mon Jun 14, 2021 10:27 am

Fri Aug 26, 2022 9:22 am

Hi Filip,
You're welcome!
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Return to Spire.PDF