Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Tue Jun 01, 2021 11:47 am

I'm considering to use Spire.PDF to extract text/image/graphic contents from PDF files. I want to get position and size information of those contents in PDF page.
How can I get the information?
If you give me some example code, I am very happy.

junichi.matsunoshita@fujixerox.co.jp
 
Posts: 12
Joined: Thu Jul 04, 2019 3:54 am

Wed Jun 02, 2021 7:43 am

Hello,

Thank you for your inquiry.
Please refer to the code to get the position and size information of the text and image. If there is any question, please feel free to write back.

Code: Select all
            PdfDocument pdf = new PdfDocument();
            pdf.LoadFromFile(inputFile);
            var page = pdf.Pages[0];
            StringBuilder builder = new StringBuilder();         
                for (int i = 0; i < page.ImagesInfo.Length; i++)
                {
                //Get image location
                float x = page.ImagesInfo[i].Bounds.Location.X;
                float y = page.ImagesInfo[i].Bounds.Location.Y;
                //Get image size
                float width = page.ImagesInfo[i].Bounds.Width;
                float height = page.ImagesInfo[i].Bounds.Height;
               
                string imageFileName = string.Format("Image-{0}.png", i);
                Image images = page.ImagesInfo[i].Image;
                images.Save(imageFileName, ImageFormat.Png);
                builder.AppendLine(imageFileName + "==x: " + x.ToString() + " y: " + y.ToString() + " width: " + width.ToString() + " height: " + height.ToString());
            }

            PdfTextFindCollection collection = page.FindAllText();         
                foreach (PdfTextFind find in collection.Finds)
                {
                //Get text position
                PointF point = find.Position;
                //Get text content
                string str = find.MatchText;
                //Get text size
                SizeF size = find.Size;
                builder.AppendLine(str + "==" + point.ToString() + size.ToString());
                }
            File.WriteAllText("result.txt", builder.ToString());

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Thu Jun 03, 2021 12:17 pm

Hi, Annika
Thank you for example code. I tried the code and could get position and size of texts and images from PDF file, but I still have a problem that I can't get graphics object data such as rectangle shape or illustration which is not raster images and position/size information of them. How can I get them or Can I get them?
If you give me some advice or sample code, I appreciate you.

junichi.matsunoshita@fujixerox.co.jp
 
Posts: 12
Joined: Thu Jul 04, 2019 3:54 am

Fri Jun 04, 2021 10:02 am

Hello,

Thank you for your feedback.
Sorry that our Spire.PDF doesn't support getting the location and size information of shape or illustration at present. If you have other question, please feel free to write back.

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Fri Jun 04, 2021 11:56 am

Hello.
Thank you for reply, but I'm sorry to hear Spire.PDF doesn't support such functions. I'll look for another way.

junichi.matsunoshita@fujixerox.co.jp
 
Posts: 12
Joined: Thu Jul 04, 2019 3:54 am

Mon Jun 07, 2021 1:37 am

Hello,

Sorry for not helping you. If you have other questions when using our products in the future, please feel free to contact us.

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Thu Apr 28, 2022 3:24 am

Hi there,

May I know how Spire.PDF can get the position coordinates from the text that we had extract out from the PDF files ?

elizabethC
 
Posts: 10
Joined: Thu Apr 28, 2022 3:15 am

Thu Apr 28, 2022 5:58 am

Hello,

Thank you for your inquiry.
Our Spire.PDF gets the location coordinates by looking up the text. The sample code is as follows:
Code: Select all
PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile("inputFile");
PdfTextFind[] result = null;
foreach (PdfPageBase page in pdf.Pages)
{
    //Find text
    result = page.FindText("searchText ", TextFindParameter.CrossLine).Finds;
    foreach (PdfTextFind find in result)
    {
        //Get the position coordinates of text that does not cross lines
        PointF pointF = find.Position;
        //Get the position coordinates of text across lines
        List<PointF> pointFs  =  find.Positions;
    }
}

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Return to Spire.PDF