Table is one of the most commonly used formatting elements in PDF. In some cases, you may need to extract data from PDF tables to perform further analysis. In this article, you will learn how to achieve this task programmatically in Java using Spire.PDF for Java.
Install Spire.PDF for Java
First of all, you're required to add the Spire.PDF.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.pdf</artifactId> <version>10.9.0</version> </dependency> </dependencies>
Extract Table Data from PDF Document
Spire.PDF for Java uses the PdfTableExtractor.extractTable(int pageIndex) method to detect and extract tables from a desired PDF page.
The following are the steps to extract table data from a PDF file:
- Load a sample PDF document using PdfDocument class.
- Create a StringBuilder instance and a PdfTableExtractor instance.
- Loop through the pages in the PDF, extract tables from each page into a PdfTable array using PdfTableExtractor.extractTable(int pageIndex) method.
- Loop through the tables in the array.
- Loop through the rows and columns in each table, after that extract data from each table cell using PdfTable.getText(int rowIndex, int columnIndex) method, then append the data to the StringBuilder instance using StringBuilder.append() method.
- Write the extracted data to a txt document using Writer.write() method.
- Java
import com.spire.pdf.PdfDocument; import com.spire.pdf.utilities.PdfTable; import com.spire.pdf.utilities.PdfTableExtractor; import java.io.FileWriter; public class ExtractTableData { public static void main(String []args) throws Exception { //Load a sample PDF document PdfDocument pdf = new PdfDocument("Sample.pdf"); //Create a StringBuilder instance StringBuilder builder = new StringBuilder(); //Create a PdfTableExtractor instance PdfTableExtractor extractor = new PdfTableExtractor(pdf); //Loop through the pages in the PDF for (int pageIndex = 0; pageIndex < pdf.getPages().getCount(); pageIndex++) { //Extract tables from the current page into a PdfTable array PdfTable[] tableLists = extractor.extractTable(pageIndex); //If any tables are found if (tableLists != null && tableLists.length > 0) { //Loop through the tables in the array for (PdfTable table : tableLists) { //Loop through the rows in the current table for (int i = 0; i < table.getRowCount(); i++) { //Loop through the columns in the current table for (int j = 0; j < table.getColumnCount(); j++) { //Extract data from the current table cell and append to the StringBuilder String text = table.getText(i, j); builder.append(text + " | "); } builder.append("\r\n"); } } } } //Write data into a .txt document FileWriter fw = new FileWriter("ExtractTable.txt"); fw.write(builder.toString()); fw.flush(); fw.close(); } }
The input PDF:
The output .txt document with extracted table data:
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.