Extract/Read Text from PDF in Java

This article is going to demonstrate how to extract/read text from a PDF file using Spire.PDF for JAVA.

The example PDF file:

Extract/Read Text from PDF in Java

Code snippets:

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import java.io.*;

public class Extract_Text {

	public static void main(String[] args) {
		
		//Create a PdfDocument instance
		PdfDocument doc=new PdfDocument();
		//Load the PDF file
        doc.loadFromFile("test.pdf");
        
        //Create a StringBuilder instance        
        StringBuilder sb=new StringBuilder();
        
        PdfPageBase page;
        
        //Loop through PDF pages and get text of each page
        for(int i=0;i<doc.getPages().getCount();i++){
            page=doc.getPages().get(i);            
            sb.append(page.extractText(true));
        }
        FileWriter writer;
        try {
            //Write text into a .txt file
            writer = new FileWriter("ExtractText.txt");
            writer.write(sb.toString());
            writer.flush();
        } catch (IOException e) {
            e.printStackTrace();
        }

        doc.close();
	}
}

Output:

Extract/Read Text from PDF in Java