Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Thu Jul 09, 2020 2:19 pm

Hello,

I realize the news version, is taking a different result in the rectangle area, compared with the old versions, for example, the 5.10.8.2040.

This behavior is worrying us because every update we lost our position records. Look at the attached files. I'm attaching area.zip, showing graphically the position the was extracting correctly the data in 5.10.8.204 version. Updating to the current version, these positions don't work anymore. I'm putting the parameters represented in the Area.zip here:


X = 23
Y = 54
W = 153
H = 5

PdfPageBase pPage;
RectangleF retc;

rect = new RectangleF(X, Y, W, H);
pPage.ExtractText(retc);

I'm also attaching the PDF file for this position and the graphical area. If you change the H to bigger, to 6 our 7, the extracted text works. My concern is about this change. I would like to understand if this will happen frequently because it's very bad for an application that hopes their update components being compatible.

Att,
Fernando.

Dicipulofer
 
Posts: 24
Joined: Wed Apr 08, 2015 1:41 pm

Fri Jul 10, 2020 6:48 am

Hello,

Thanks for your inquiry.
I tested your code using the Spire.PDF v5.10.8 you mentioned and indeed found the text can be successfully extracted, while switching to the latest version failed. I have forwarded this situation to our Dev team. If there is any update, we will inform you. Sorry for the inconvenience caused.

Sincerely,
Rachel
E-iceblue support team
User avatar

rachel.lei
 
Posts: 1571
Joined: Tue Jul 09, 2019 2:22 am

Mon Jul 13, 2020 9:57 am

Hello,

Hope you are doing well.
I learned from our Dev team that in order to make our product more standardized and perfect, they have adjusted the code for the function of extracting text from a specified rectangular area. Currently when using newer versions of Spire.PDF, if you want the text to be extracted successfully, at least half of the text content should be in the specified area. And we noticed that the font size of the text you want to extract is 12. Therefore, you need to set the value of H to 6 or higher so that the text can be extracted as expected.
If there is any other question, just feel free to contact us.

Sincerely,
Rachel
E-iceblue support team
User avatar

rachel.lei
 
Posts: 1571
Joined: Tue Jul 09, 2019 2:22 am

Mon Jul 13, 2020 2:40 pm

Ok, I understand.

Could you give me one last information?

You said the rectangle font area was 12. Is there a way to get this information by code?

I know that this code return all fonts used in the PDF document. But how can I discover the font about one related area ?


PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile(@"E:\Program Files\Sample.pdf");
PdfUsedFont[] usedfont = pdf.UsedFonts;
foreach (PdfUsedFont font in usedfont)
{
Console.WriteLine("{0}, {1}, {2}, {3}", font.Name, font.Size, font.Type, font.Style);
}

Dicipulofer
 
Posts: 24
Joined: Wed Apr 08, 2015 1:41 pm

Tue Jul 14, 2020 2:22 am

Hello,

Thanks for your response.
Sorry currently our Spire.PDF does not support getting the font information for text within a specified rectangular area. But we have added it as a new feature to our upgrade list with the ticket SPIREPDF-3413. If there is some definite news regarding its implementation, we would be pleased to update you the status of availability.

Sincerely,
Rachel
E-iceblue support team
User avatar

rachel.lei
 
Posts: 1571
Joined: Tue Jul 09, 2019 2:22 am

Return to Spire.PDF