Java: Export Table Data from PDF to Excel

Tables are commonly seen in PDF invoices and financial reports. You may encounter the situation where you need to export PDF table data into Excel, so that you can analyze the data using the tools provided by MS Excel. This article explains how to extract tables from a PDF page and export them as individual Excel worksheets using Spire.Office for Java.

Install Spire.Office for Java

The scenario actually uses Spire.PDF for Java for extracting tables from PDF, and Spire.XLS for Java for generating Excel files. In order to use them in the same project, you’ll need to add the Spire.Office.jar file as a dependency in your Java program.

The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project’s pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.office</artifactId>
        <version>9.3.1</version>
    </dependency>
</dependencies>
    

Export Table Data from PDF to Excel

The following are the main steps to extract all tables from a certain page and save each of them as an individual worksheet in an Excel document.

  • Load a sample PDF document while initializing the PdfDocument object.
  • Create a PdfTableExtractor object, and call extactTable(int pageIndex) method under it to extract all tables in the first page.
  • Create a Workbook instance.
  • Loop through the tables in the PdfTable[] array, and get the specific one by its index.
  • Add a worksheet to the workbook using Workbook.getWorksheets.add() method.
  • Loop through the cells in the PDF table, and get the value of a specific cell using PdfTable.getText(int rowIndex, int columnIndex) method. Then insert the value to the worksheet using Worksheet.get(int row, int column).setText(String string) method.
  • Save the workbook to an Excel document using Workbook.saveToFile() method.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;

public class ExtractTableDataAndSaveInExcel {

    public static void main(String[] args) {

        //Load a sample PDF document
        PdfDocument pdf = new PdfDocument("C:\\Users\\Administrator\\Desktop\\Tables.pdf");

        //Create a PdfTableExtractor instance
        PdfTableExtractor extractor = new PdfTableExtractor(pdf);
        
        //Extract tables from the first page
        PdfTable[] pdfTables  = extractor.extractTable(0);

        //Create a Workbook object,
        Workbook wb = new Workbook();

        //Remove default worksheets
        wb.getWorksheets().clear();

        //If any tables are found
        if (pdfTables != null && pdfTables.length > 0) {

            //Loop through the tables
            for (int tableNum = 0; tableNum < pdfTables.length; tableNum++) {

                //Add a worksheet to workbook
                String sheetName = String.format("Table - %d", tableNum + 1);
                Worksheet sheet = wb.getWorksheets().add(sheetName);

                //Loop through the rows in the current table
                for (int rowNum = 0; rowNum < pdfTables[tableNum].getRowCount(); rowNum++) {

                    //Loop through the columns in the current table
                    for (int colNum = 0; colNum < pdfTables[tableNum].getColumnCount(); colNum++) {

                        //Extract data from the current table cell
                        String text = pdfTables[tableNum].getText(rowNum, colNum);

                        //Insert data into a specific cell
                        sheet.get(rowNum + 1, colNum + 1).setText(text);

                    }
                }

                //Auto fit column width
                for (int sheetColNum = 0; sheetColNum < sheet.getColumns().length; sheetColNum++) {
                    sheet.autoFitColumn(sheetColNum + 1);
                }
            }
        }

        //Save the workbook to an Excel file
        wb.saveToFile("output/ExportTableToExcel.xlsx", ExcelVersion.Version2016);
    }
}

Java: Export Table Data from PDF to Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.