Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Sat Mar 24, 2018 2:22 pm

Hello,
I am currently evaluating your PDF library, I am interested to extract text from a pdf file by drawing a rectangle on top of it. my downloaded version is 4.0.0.2040.
To do this, I am converting the pdf into an image and showing the image on the form. Then I am drawing rectangles on top of the image and calling ExtractText(new RectangleF(x,y,height, width)) to get the text from that coordinate.

The problem is, the two coordinates do not always match. I have attached the sample project I am working on, also a sample file with which I am testing the result.

can you help me what I am doing wrong, or is there a better way to achive what I am looking for?

Thank you.
Nilarya

nilarya
 
Posts: 20
Joined: Mon Mar 19, 2018 7:23 am

Mon Mar 26, 2018 7:33 am

Hello Nilarya,

Thanks for your post. I have noticed the issue and posted it to our DEV team for further investigation. If there is any update or workaround from them, we will let you know.
On the other hand, to help us look into the issue, would you please share us with your PDF file? You could zip it and attach here or send to our email(support@e-iceblue.com).

Best regards,
Simon
E-iceblue support team
User avatar

Simon.yang
 
Posts: 620
Joined: Wed Jan 11, 2017 2:03 am

Tue Mar 27, 2018 9:27 am

Hello Nilarya,

Glad to inform you that the issue has been fixed. The cause of it is the different units between WindowsForms and Spire.Pdf when using ExtractText().
Please change
Code: Select all
string text = page.ExtractText(new RectangleF(Rect.Left, Rect.Top, Rect.Width, Rect.Height));

into
Code: Select all
string text = page.ExtractText(new RectangleF(Rect.Left * 72 / pictureBox1.Image.HorizontalResolution, Rect.Top * 72 / pictureBox1.Image.VerticalResolution, Rect.Width * 72 / pictureBox1.Image.HorizontalResolution, Rect.Height * 72 / pictureBox1.Image.VerticalResolution));

to keep the units consistent.

Best regards,
Simon
E-iceblue support team
User avatar

Simon.yang
 
Posts: 620
Joined: Wed Jan 11, 2017 2:03 am

Tue Mar 27, 2018 2:37 pm

thank you! that works.

nilarya
 
Posts: 20
Joined: Mon Mar 19, 2018 7:23 am

Wed Mar 28, 2018 1:27 am

Hello Nilarya,

Thanks for your feedback. Just feel free to contact us if you encounter any other issue.

Best regards,
Simon
E-iceblue support team
User avatar

Simon.yang
 
Posts: 620
Joined: Wed Jan 11, 2017 2:03 am

Tue Sep 18, 2018 12:53 pm

Hello,

I have used your code for extracting text of a .pdf file, but I seem there is something wrong , when I want to extract a single letter from a .pdf file but I am getting another letter with that letter and often I got another text not that text which I have selected,
I have attached the screenshot , please follow.

Thank you,

nilarya
 
Posts: 20
Joined: Mon Mar 19, 2018 7:23 am

Wed Sep 19, 2018 7:29 am

Hello Nilarya,

Thanks for your inquiry.
Extracting a single letter requires 100% accurate coordinates, but you can see that the codes do some unit conversions when calculating the rectangle coordinates, it certainly causes some deviations that can't be avoided by us. Hence, I'm afraid it is impossible to extract a letter with 100% accuracy. Hope you can understand.

Sincerely,
Lisa
E-iceblue support team
User avatar

Lisa.Li
 
Posts: 1261
Joined: Wed Apr 25, 2018 3:20 am

Return to Spire.PDF