With Spire.PDF, programmers can extract text from a specific rectangular area within a PDF document. This article demonstrates how to implement this function using Spire.PDF and C#.
The sample file we used for demonstration:
Detail steps:
Step 1: Initialize an object of PdfDocument class and load the PDF file.
PdfDocument pdf = new PdfDocument(); pdf.LoadFromFile("Stories.pdf");
Step 2: Get the first page.
PdfPageBase page = pdf.Pages[0];
Step 3: Extract text from a specific rectangular area within the page, after that save the text to a .txt file.
string text = page.ExtractText(new RectangleF(50, 50, 500, 100) ); StringBuilder sb = new StringBuilder(); sb.AppendLine(text); File.WriteAllText("Extract.txt", sb.ToString());
Output:
Full code:
using Spire.Pdf; using System.Drawing; using System.IO; using System.Text; namespace ExtractText { class Program { static void Main(string[] args) { //Initialize an object of PdfDocument class PdfDocument pdf = new PdfDocument(); //Load the PDF file pdf.LoadFromFile("Stories.pdf"); //Get the first page PdfPageBase page = pdf.Pages[0]; // Extract text from a specific rectangular area within the page string text = page.ExtractText(new RectangleF(50, 50, 500, 100)); //Save the text to a .txt file StringBuilder sb = new StringBuilder(); sb.AppendLine(text); File.WriteAllText("Extract.txt", sb.ToString()); } } }