Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Fri Aug 07, 2020 10:59 pm

Hello
I trying to develop an application to extract words from PDF documents based on words locations (rectangle coordinates)
my problem that when I select a specific word for extraction, the extractor code is not accurate in retuning values even if I defined the rectangle with coordinates that cover the word location, in some cases, it returns correct values however in other cases, it doesn’t return any values,
The attached document describes the cases in six trials I have executed to investigate the problem root cause all I do is changing the width of the word
Please review the code described in the attached word document (Spire PDF Extraction Issue Description.docx) and I have attached the PDF document(495.pdf) to help you in reproducing the problem

I am using Spire PDF version 6.8.1

NasserTohamy
 
Posts: 19
Joined: Fri Jun 19, 2020 10:40 pm

Mon Aug 10, 2020 6:58 am

Hello,

Thanks for your inquiry.
Which tool do you use to measure the rectangular area of the text?
I tested your scenario and did notice the behavior you mentioned. I have posted this issue to our Dev team with the ticket SPIREPDF-3483 for further investigation. If there is any update, we will let you know.
Besides, I tried to use the method “FindText” to search the text “FJB1805948730288”, but the bounds of the text I got is: X = 374.750977, Y = 525.2901, Width = 96.90252, Height = 11.9828033. Then I defined a rectangle with these coordinates to extract the text, I found the extracted result is correct. The following is code I am using, you can try it on your side.
Code: Select all
           Spire.Pdf.PdfDocument doc = new PdfDocument(XXX.pdf);
           PdfTextFind[] result = null;
            foreach (PdfPageBase page in doc.Pages)
            {
                result = page.FindText("FJB1805948730288", TextFindParameter.None).Finds;

                foreach (PdfTextFind find in result)
                {
                    //get the bounds
                    RectangleF rect = find.Bounds;
                    //Extract the text
                    string text1 = page.ExtractText(rect);
                }
            }


Sincerely
Elena
E-iceblue support team
User avatar

Elena.Zhang
 
Posts: 279
Joined: Thu Jul 23, 2020 1:18 am

Wed Aug 12, 2020 6:04 pm

Hello
Thank you for your response
First, I need to highlight that I am trying to build an application where the user is draw rectangle over a specific word from PDF viewer, our application catches the coordinates of the rectangle drawn by the user and retrieve the text inside that rectangle (drawn by the user)
Accordingly, we do not have control on the user selection for the rectangle al we need from the spire extractor is to retrieve correctly the text data within the drawn rectangle
In order to debug and investigate our problem, we have found a free tool on the internet called “PDF Multitool” that enable us to get the rectangle coordinates (in terms of X,Y, Width, Height) for words inside PDF docs however we use this tool only to help us to identify the coordinates and test that our extraction is correct, you can download the tool from the following link (https://cdn.bytescout.com/PDFMultitool. ... 1596710050)

NasserTohamy
 
Posts: 19
Joined: Fri Jun 19, 2020 10:40 pm

Thu Aug 13, 2020 7:31 am

Hi,

Thanks for providing more information, I will forward it to our Dev team. We will keep you informed if there is any progress regarding the issue SPIREPDF-3483 .

Sincerely
Elena
E-iceblue support team
User avatar

Elena.Zhang
 
Posts: 279
Joined: Thu Jul 23, 2020 1:18 am

Fri Aug 21, 2020 9:47 am

Hello
Please advise if you have any updates for the reported issue or advise when I can expect response for this issue

NasserTohamy
 
Posts: 19
Joined: Fri Jun 19, 2020 10:40 pm

Fri Aug 21, 2020 10:32 am

Hello,

Thank you for your follow-up
Sorry to tell you that our Dev team is still investigating your issue. Please spare us more time. I will keep you informed regarding any available update. Sorry for the inconvenience caused.

Sincerely,
Rachel
E-iceblue support team
User avatar

Elena.Zhang
 
Posts: 279
Joined: Thu Jul 23, 2020 1:18 am

Fri Aug 28, 2020 12:31 pm

Hello
Please advise if you have any updates for the reported issue or advise when I can expect response for this issue

NasserTohamy
 
Posts: 19
Joined: Fri Jun 19, 2020 10:40 pm

Mon Aug 31, 2020 9:07 am

Hello,

Sorry for the late reply as weekend.
I just checked the status of your issue but found it hasn’t been resolved. I have urged our Dev team and given your issue a high priority. Sorry we can’t give you an estimated time at this moment. Anyway, we will notify you as soon as this issue is resolved. Apologize for this delay and inconvenience.

Sincerely,
Elena
E-iceblue support team
User avatar

Elena.Zhang
 
Posts: 279
Joined: Thu Jul 23, 2020 1:18 am

Sat Sep 19, 2020 1:56 pm

Hello
Please advise if you have any updates for the reported issue or advise when I can expect response for this issue

NasserTohamy
 
Posts: 19
Joined: Fri Jun 19, 2020 10:40 pm

Mon Sep 21, 2020 2:03 am

Hello,

Thanks for your following up.
I just got news from our Dev team that your issue has been resolved and is now under the testing phase. If it passes the test, we will prepare a hotfix for you.
Sorry for the delay and inconvenience caused.

Sincerely,
Elena
E-iceblue support team
User avatar

Elena.Zhang
 
Posts: 279
Joined: Thu Jul 23, 2020 1:18 am

Mon Sep 28, 2020 9:35 am

Hello,

Hope you are doing well.
Glad to inform you that we just released Spire.PDF Pack(Hot Fix) Version:6.9.16 which fixes your issue. Welcome to download it from the following links.
Our website: https://www.e-iceblue.com/Download/down ... t-now.html
Nuget: https://www.nuget.org/packages/Spire.PDF/6.9.16

Sincerely,
Elena
E-iceblue support team
User avatar

Elena.Zhang
 
Posts: 279
Joined: Thu Jul 23, 2020 1:18 am

Return to Spire.PDF