Extract Text from a Specific Rectangular Area in PDF using C#

With Spire.PDF, programmers can extract text from a specific rectangular area within a PDF document. This article demonstrates how to implement this function using Spire.PDF and C#.

The sample file we used for demonstration:

Extract Text from a Specific Rectangular Area in PDF using C#

Detail steps:

Step 1: Initialize an object of PdfDocument class and load the PDF file.

PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile("Stories.pdf");

Step 2: Get the first page.

PdfPageBase page = pdf.Pages[0];

Step 3: Extract text from a specific rectangular area within the page, after that save the text to a .txt file.

string text = page.ExtractText(new RectangleF(50, 50, 500, 100) );
StringBuilder sb = new StringBuilder();
sb.AppendLine(text);
File.WriteAllText("Extract.txt", sb.ToString());

Output:

Extract Text from a Specific Rectangular Area in PDF using C#

Full code:

using Spire.Pdf;
using System.Drawing;
using System.IO;
using System.Text;

namespace ExtractText
{
    class Program
    {
        static void Main(string[] args)
        {
            //Initialize an object of PdfDocument class
            PdfDocument pdf = new PdfDocument();
            //Load the PDF file
            pdf.LoadFromFile("Stories.pdf");

            //Get the first page
            PdfPageBase page = pdf.Pages[0];

            // Extract text from a specific rectangular area within the page
            string text = page.ExtractText(new RectangleF(50, 50, 500, 100));

            //Save the text to a .txt file
            StringBuilder sb = new StringBuilder();
            sb.AppendLine(text);
            File.WriteAllText("Extract.txt", sb.ToString());
        }
    }
}