Java extract text from specific area or particular page of PDF

This article will show you how to extract text from the specific area of a PDF page or a particular page of PDF.

Extract text from the defined area from one page of PDF:

import com.spire.pdf.*;
import java.awt.geom.Rectangle2D;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;

public class ExtractText {
    public static void main(String[] args) throws Exception{
        //Load the PDF file
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("sample.pdf");

        //Create a new txt file to save the extracted text
        String result = "output/extractTextFromSpecificArea.txt";
        File file=new File(result);
        if(!file.exists()){
            file.delete();
        }
        file.createNewFile();
        FileWriter fw=new FileWriter(file,true);
        BufferedWriter bw=new BufferedWriter(fw);

        //Get the first page
        PdfPageBase page = pdf.getPages().get(0);

        //Extract text from a specific rectangular area within the page
        String text = page.extractText(new Rectangle2D.Float(80, 20, 500, 260));

        bw.write(text);

        bw.flush();
        bw.close();
        fw.close();
    }
}

Effective screenshot:

Java extract text from specific area or particular page of PDF

Extract text from a particular page of PDF:

import com.spire.pdf.*;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;

public class ExtractText {
    public static void main(String[] args) throws Exception{
        //Load the PDF file
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("sample.pdf");

        //Create a new txt file to save the extracted text
        String result = "output/extractTextFromParticularPage.txt";
        File file=new File(result);
        if(!file.exists()){
            file.delete();
        }
        file.createNewFile();
        FileWriter fw=new FileWriter(file,true);
        BufferedWriter bw=new BufferedWriter(fw);

        //Get the third page
        PdfPageBase page = pdf.getPages().get(2);

        // Extract text from page keeping white space
        String text = page.extractText(true);

        // Extract text from page without keeping white space
        //String text = page.extractText(false);

        bw.write(text);

        bw.flush();
        bw.close();
        fw.close();
    }
}

Effective screenshot of the extract text from the third page:

Java extract text from specific area or particular page of PDF