This article will show you how to extract text from the specific area of a PDF page or a particular page of PDF.
Extract text from the defined area from one page of PDF:
import com.spire.pdf.*; import java.awt.geom.Rectangle2D; import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; public class ExtractText { public static void main(String[] args) throws Exception{ //Load the PDF file PdfDocument pdf = new PdfDocument(); pdf.loadFromFile("sample.pdf"); //Create a new txt file to save the extracted text String result = "output/extractTextFromSpecificArea.txt"; File file=new File(result); if(!file.exists()){ file.delete(); } file.createNewFile(); FileWriter fw=new FileWriter(file,true); BufferedWriter bw=new BufferedWriter(fw); //Get the first page PdfPageBase page = pdf.getPages().get(0); //Extract text from a specific rectangular area within the page String text = page.extractText(new Rectangle2D.Float(80, 20, 500, 260)); bw.write(text); bw.flush(); bw.close(); fw.close(); } }
Effective screenshot:
Extract text from a particular page of PDF:
import com.spire.pdf.*; import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; public class ExtractText { public static void main(String[] args) throws Exception{ //Load the PDF file PdfDocument pdf = new PdfDocument(); pdf.loadFromFile("sample.pdf"); //Create a new txt file to save the extracted text String result = "output/extractTextFromParticularPage.txt"; File file=new File(result); if(!file.exists()){ file.delete(); } file.createNewFile(); FileWriter fw=new FileWriter(file,true); BufferedWriter bw=new BufferedWriter(fw); //Get the third page PdfPageBase page = pdf.getPages().get(2); // Extract text from page keeping white space String text = page.extractText(true); // Extract text from page without keeping white space //String text = page.extractText(false); bw.write(text); bw.flush(); bw.close(); fw.close(); } }
Effective screenshot of the extract text from the third page: