This article is going to demonstrate how to extract/read text from a PDF file using Spire.PDF for JAVA.
The example PDF file:
Code snippets:
import com.spire.pdf.PdfDocument; import com.spire.pdf.PdfPageBase; import java.io.*; public class Extract_Text { public static void main(String[] args) { //Create a PdfDocument instance PdfDocument doc=new PdfDocument(); //Load the PDF file doc.loadFromFile("test.pdf"); //Create a StringBuilder instance StringBuilder sb=new StringBuilder(); PdfPageBase page; //Loop through PDF pages and get text of each page for(int i=0;i<doc.getPages().getCount();i++){ page=doc.getPages().get(i); sb.append(page.extractText(true)); } FileWriter writer; try { //Write text into a .txt file writer = new FileWriter("ExtractText.txt"); writer.write(sb.toString()); writer.flush(); } catch (IOException e) { e.printStackTrace(); } doc.close(); } }
Output: