Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Tue Nov 26, 2019 1:43 pm

Dear E-IceBlue!
I read your tutorial regarding How to Extract Text from PDF Document in C#, VB.NET.
But I need extra information as array of all Texts on page, where the Texts are stored on page (Bounds) and which font was used in this Text.

For all Images on PDF page there is PdfImageInfo[] array ImagesInfo in PdfPageBase with usefull properties Bounds and Image data.

Do you have something similar for Text objects on page or do you plan to implement it? Will really help.

Thank you.
BR,
ProCam

procamSpire
 
Posts: 22
Joined: Tue Sep 03, 2019 7:56 am

Wed Nov 27, 2019 12:11 pm

Hi,

Thanks for your inquiry.
There is a method FindAllText which could return an array of all text with bounds on page, but sorry that there is no font information in it at present. I will post your requirement to our Dev team. We will let you know once there is any update. Below is the code for you.

Code: Select all
            PdfDocument doc = new PdfDocument(@"……/filename.pdf");

            //Get a collection of all text with bounds.
            PdfTextFindCollection allTextFind = doc.Pages[0].FindAllText();

            foreach (PdfTextFind find in allTextFind.Finds)
            {
                Console.WriteLine(find.Bounds);
            }


Best wishes,
Amber
E-iceblue support team
User avatar

Amber.Gu
 
Posts: 525
Joined: Tue Jun 04, 2019 3:16 am

Mon Dec 16, 2019 10:46 am

Hi,

Hope you are doing well.
Glad to inform you that the new property "FontName" has been added in the method "FindAllText" in Spire.PDF Pack(Hot Fix) Version:5.12.15. Welcome to download and test it from the following links.
Our website link: https://www.e-iceblue.com/Download/download-pdf-for-net-now.html
NuGet link: https://www.nuget.org/packages/Spire.PDF/5.12.15

And below is the code for you.
Code: Select all
            PdfDocument doc = new PdfDocument(@"..\filename.pdf");

            //Get a collection of all text with font information
            PdfTextFindCollection allTextFind = doc.Pages[0].FindAllText();

            foreach (PdfTextFind find in allTextFind.Finds)
            {
                Console.WriteLine(find.FontName);
            }

Best wishes,
Amber
E-iceblue support team
User avatar

Amber.Gu
 
Posts: 525
Joined: Tue Jun 04, 2019 3:16 am

Wed Dec 25, 2019 10:03 am

Hi,

Greetings from E-iceblue.
Have you tried Spire.PDF Pack(Hot Fix) Version:5.12.15? Does it solve your issue?
Could you please give us some feedback at your convenience?

Best wishes,
Amber
E-iceblue support team
User avatar

Amber.Gu
 
Posts: 525
Joined: Tue Jun 04, 2019 3:16 am

Wed Jan 08, 2020 9:55 am

Amber.Gu wrote:Hi,

Greetings from E-iceblue.
Have you tried Spire.PDF Pack(Hot Fix) Version:5.12.15? Does it solve your issue?
Could you please give us some feedback at your convenience?

Best wishes,
Amber
E-iceblue support team



Dear IceBlue!
Yes, It is working well.
So now I read from PdfTextFind object find these variables:
find.MatchText, find.FontName, find.Position.X, find.Position.Y, find.Size.Height.
Thank you!
BR
Procam

PS: So still is missing: fontColor, FontWeight, FontStyle, TextDecoration, TextAlignment, BaselineAlignment to have it 100% :wink:

procamSpire
 
Posts: 22
Joined: Tue Sep 03, 2019 7:56 am

Thu Jan 09, 2020 1:52 am

Hi,

Thanks for your reply.
We will consider adding these new features into our upgrade schedule. Once there is any progress, we will inform you.

Best wishes,
Amber
E-iceblue support team
User avatar

Amber.Gu
 
Posts: 525
Joined: Tue Jun 04, 2019 3:16 am

Return to Spire.PDF

cron