Table (4)
Tables are commonly seen in PDF invoices and financial reports. You may encounter the situation where you need to export PDF table data into Excel, so that you can analyze the data using the tools provided by MS Excel. This article explains how to extract tables from a PDF page and export them as individual Excel worksheets using Spire.Office for Java.
Install Spire.Office for Java
The scenario actually uses Spire.PDF for Java for extracting tables from PDF, and Spire.XLS for Java for generating Excel files. In order to use them in the same project, you’ll need to add the Spire.Office.jar file as a dependency in your Java program.
The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project’s pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.office</artifactId> <version>9.3.1</version> </dependency> </dependencies>
Export Table Data from PDF to Excel
The following are the main steps to extract all tables from a certain page and save each of them as an individual worksheet in an Excel document.
- Load a sample PDF document while initializing the PdfDocument object.
- Create a PdfTableExtractor object, and call extactTable(int pageIndex) method under it to extract all tables in the first page.
- Create a Workbook instance.
- Loop through the tables in the PdfTable[] array, and get the specific one by its index.
- Add a worksheet to the workbook using Workbook.getWorksheets.add() method.
- Loop through the cells in the PDF table, and get the value of a specific cell using PdfTable.getText(int rowIndex, int columnIndex) method. Then insert the value to the worksheet using Worksheet.get(int row, int column).setText(String string) method.
- Save the workbook to an Excel document using Workbook.saveToFile() method.
- Java
import com.spire.pdf.PdfDocument; import com.spire.pdf.utilities.PdfTable; import com.spire.pdf.utilities.PdfTableExtractor; import com.spire.xls.ExcelVersion; import com.spire.xls.Workbook; import com.spire.xls.Worksheet; public class ExtractTableDataAndSaveInExcel { public static void main(String[] args) { //Load a sample PDF document PdfDocument pdf = new PdfDocument("C:\\Users\\Administrator\\Desktop\\Tables.pdf"); //Create a PdfTableExtractor instance PdfTableExtractor extractor = new PdfTableExtractor(pdf); //Extract tables from the first page PdfTable[] pdfTables = extractor.extractTable(0); //Create a Workbook object, Workbook wb = new Workbook(); //Remove default worksheets wb.getWorksheets().clear(); //If any tables are found if (pdfTables != null && pdfTables.length > 0) { //Loop through the tables for (int tableNum = 0; tableNum < pdfTables.length; tableNum++) { //Add a worksheet to workbook String sheetName = String.format("Table - %d", tableNum + 1); Worksheet sheet = wb.getWorksheets().add(sheetName); //Loop through the rows in the current table for (int rowNum = 0; rowNum < pdfTables[tableNum].getRowCount(); rowNum++) { //Loop through the columns in the current table for (int colNum = 0; colNum < pdfTables[tableNum].getColumnCount(); colNum++) { //Extract data from the current table cell String text = pdfTables[tableNum].getText(rowNum, colNum); //Insert data into a specific cell sheet.get(rowNum + 1, colNum + 1).setText(text); } } //Auto fit column width for (int sheetColNum = 0; sheetColNum < sheet.getColumns().length; sheetColNum++) { sheet.autoFitColumn(sheetColNum + 1); } } } //Save the workbook to an Excel file wb.saveToFile("output/ExportTableToExcel.xlsx", ExcelVersion.Version2016); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Table is one of the most commonly used formatting elements in PDF. In some cases, you may need to extract data from PDF tables to perform further analysis. In this article, you will learn how to achieve this task programmatically in Java using Spire.PDF for Java.
Install Spire.PDF for Java
First of all, you're required to add the Spire.PDF.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.pdf</artifactId> <version>10.4.4</version> </dependency> </dependencies>
Extract Table Data from PDF Document
Spire.PDF for Java uses the PdfTableExtractor.extractTable(int pageIndex) method to detect and extract tables from a desired PDF page.
The following are the steps to extract table data from a PDF file:
- Load a sample PDF document using PdfDocument class.
- Create a StringBuilder instance and a PdfTableExtractor instance.
- Loop through the pages in the PDF, extract tables from each page into a PdfTable array using PdfTableExtractor.extractTable(int pageIndex) method.
- Loop through the tables in the array.
- Loop through the rows and columns in each table, after that extract data from each table cell using PdfTable.getText(int rowIndex, int columnIndex) method, then append the data to the StringBuilder instance using StringBuilder.append() method.
- Write the extracted data to a txt document using Writer.write() method.
- Java
import com.spire.pdf.PdfDocument; import com.spire.pdf.utilities.PdfTable; import com.spire.pdf.utilities.PdfTableExtractor; import java.io.FileWriter; public class ExtractTableData { public static void main(String []args) throws Exception { //Load a sample PDF document PdfDocument pdf = new PdfDocument("Sample.pdf"); //Create a StringBuilder instance StringBuilder builder = new StringBuilder(); //Create a PdfTableExtractor instance PdfTableExtractor extractor = new PdfTableExtractor(pdf); //Loop through the pages in the PDF for (int pageIndex = 0; pageIndex < pdf.getPages().getCount(); pageIndex++) { //Extract tables from the current page into a PdfTable array PdfTable[] tableLists = extractor.extractTable(pageIndex); //If any tables are found if (tableLists != null && tableLists.length > 0) { //Loop through the tables in the array for (PdfTable table : tableLists) { //Loop through the rows in the current table for (int i = 0; i < table.getRowCount(); i++) { //Loop through the columns in the current table for (int j = 0; j < table.getColumnCount(); j++) { //Extract data from the current table cell and append to the StringBuilder String text = table.getText(i, j); builder.append(text + " | "); } builder.append("\r\n"); } } } } //Write data into a .txt document FileWriter fw = new FileWriter("ExtractTable.txt"); fw.write(builder.toString()); fw.flush(); fw.close(); } }
The input PDF:
The output .txt document with extracted table data:
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
In this article, you'll learn how to add a grid to PDF and how to format the grid as well, by using Spire.PDF with Java.
import com.spire.pdf.graphics.*; import com.spire.pdf.grid.PdfGrid; import java.awt.*; public class CreateGrid { public static void main(String[] args) { //create a pdf document PdfDocument doc = new PdfDocument(); PdfPageBase page = doc.getPages().add(); //create a PdfGrid object PdfGrid grid = new PdfGrid(); //set the cell padding, font, text brush, background brush of the grid grid.getStyle().setCellPadding(new PdfPaddings(3,3,3,3)); grid.getStyle().setFont(new PdfTrueTypeFont(new Font("Arial Unicode MS", Font.PLAIN,10), true)); grid.getStyle().setTextBrush(PdfBrushes.getBlack()); grid.getStyle().setBackgroundBrush(PdfBrushes.getLightGray()); //create a PdfBorders object PdfBorders borders= new PdfBorders(); borders.setAll(new PdfPen(PdfBrushes.getWhite(),1f)); //define sample data String[] data = {"Continent;Country;Population;Ratio to World Pop;Flag", "Asia;China;1,391,190,000;18.2%; ", "Asia;Japan;126,490,000;1.66%; ", "Europe;United Kingdom;65,648,054;0.86%; ", "Europe;Germany;82,665,600;1.08%; ", "North America; Canada; 37,119,000; 0.49%; ", "North America; United States; 327,216,000; 4.29%; " }; String[][] dataSource = new String[data.length][]; for (int i = 0; i < data.length; i++) { dataSource[i] = data[i].split("[;]", -1); } //fill the grid with data grid.setDataSource(dataSource); //fill the cells with background images grid.getRows().get(1).getCells().get(4).getStyle().setBackgroundImage(PdfImage.fromFile("F:\\Documents\\flags\\flag-of-China.png")); grid.getRows().get(2).getCells().get(4).getStyle().setBackgroundImage(PdfImage.fromFile("F:\\Documents\\flags\\flag-of-Japan.png")); grid.getRows().get(3).getCells().get(4).getStyle().setBackgroundImage(PdfImage.fromFile("F:\\Documents\\flags\\flag-of-United-Kingdom.png")); grid.getRows().get(4).getCells().get(4).getStyle().setBackgroundImage(PdfImage.fromFile("F:\\Documents\\flags\\flag-of-Germany.png")); grid.getRows().get(5).getCells().get(4).getStyle().setBackgroundImage(PdfImage.fromFile("F:\\Documents\\flags\\flag-of-Canada.png")); grid.getRows().get(6).getCells().get(4).getStyle().setBackgroundImage(PdfImage.fromFile("F:\\Documents\\flags\\flag-of-United-States-of-America.png")); //set the width of the last column grid.getColumns().get(grid.getColumns().getCount()-1).setWidth(60f); //vertically span cells grid.getRows().get(1).getCells().get(0).setRowSpan(2); grid.getRows().get(3).getCells().get(0).setRowSpan(2); grid.getRows().get(5).getCells().get(0).setRowSpan(2); for (int i = 0; i < data.length ; i++) { //set the height of each row grid.getRows().get(i).setHeight(30f); //set the background color of the first column grid.getRows().get(i).getCells().get(0).getStyle().setBackgroundBrush(PdfBrushes.getDarkGray()); //set the font of the first column grid.getRows().get(i).getCells().get(0).getStyle().setFont(new PdfTrueTypeFont(new Font("Arial",Font.PLAIN,12),true)); for (int j = 0; j < grid.getColumns().getCount(); j++) { //apply border style to all cells grid.getRows().get(i).getCells().get(j).getStyle().setBorders(borders); //apply text alignment to all cells grid.getRows().get(i).getCells().get(j).setStringFormat(new PdfStringFormat(PdfTextAlignment.Center,PdfVerticalAlignment.Middle)); //set the font of the first row grid.getRows().get(0).getCells().get(j).getStyle().setFont(new PdfTrueTypeFont(new Font("Arial",Font.PLAIN,12),true)); //set the background color of the first row grid.getRows().get(0).getCells().get(j).getStyle().setBackgroundBrush(PdfBrushes.getDarkGray()); } } //draw grid on the pdf page grid.draw(page,0,30); //save to file doc.saveToFile("Grid.pdf"); doc.close(); } }
A table represents information or data in the form of horizontal rows and vertical columns. Creating tables is often more efficient than describing the data in the paragraph text, especially when the data is numerical or large. The tabular data presentation makes it easier to read and understand. In this article, you will learn how to create tables in a PDF document in Java using Spire.PDF for Java.
Spire.PDF for Java offers the PdfTable and the PdfGrid class to work with the tables in a PDF document. The PdfTable class is used to quickly create simple, regular tables without too much formatting, while the PdfGrid class is used to create more complex tables.
The table below lists the differences between these two classes.
PdfTable | PdfGrid | |
Formatting | ||
Row | Can be set through events. No API support. | Can be set through API. |
Column | Can be set through API. | Can be set through API. |
Cell | Can be set through events. No API support. | Can be set through API. |
Others | ||
Column span | Not support. | Can be set through API. |
Row span | Can be set through events. No API support. | Can be set through API. |
Nested table | Can be set through events. No API support. | Can be set through API. |
Events | BeginCellLayout, EndCellLayout, BeginRowLayout, EndRowLayout, BeginPageLayout, EndPageLayout. | BeginPageLayout, EndPageLayout. |
The following sections demonstrate how to create a table in PDF using the PdfTable class and the PdfGrid class, respectively.
Install Spire.PDF for Java
First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.pdf</artifactId> <version>10.4.4</version> </dependency> </dependencies>
Create a Table in PDF Using PdfTable Class
The following are the steps to create a table using the PdfTable class using Spire.PDF for Java.
- Create a PdfDocument object.
- Add a page to it using PdfDocument.getPages().add() method.
- Create a Pdftable object.
- Set the table style using the methods under PdfTableStyle object which is returned by PdfTable.getTableStyle() method.
- Insert data to table using PdfTable.setDataSource() method.
- Set row height and row color through BeginRowLayout event.
- Draw table on the PDF page using PdfTable.draw() method.
- Save the document to a PDF file using PdfDocument.saveToFile() method.
- Java
import com.spire.data.table.DataTable; import com.spire.pdf.PdfDocument; import com.spire.pdf.PdfPageBase; import com.spire.pdf.PdfPageSize; import com.spire.pdf.graphics.*; import com.spire.pdf.tables.*; import java.awt.*; import java.awt.geom.Point2D; public class CreateTable { public static void main(String[] args) { //Create a PdfDocument object PdfDocument doc = new PdfDocument(); //Add a page PdfPageBase page = doc.getPages().add(PdfPageSize.A4, new PdfMargins(40)); //Create a PdfTable object PdfTable table = new PdfTable(); //Set font for header and the rest cells table.getStyle().getDefaultStyle().setFont(new PdfTrueTypeFont(new Font("Times New Roman", Font.PLAIN, 12), true)); table.getStyle().getHeaderStyle().setFont(new PdfTrueTypeFont(new Font("Times New Roman", Font.BOLD, 12), true)); //Define data String[] data = {"ID;Name;Department;Position;Level", "1; David; IT; Manager; 1", "3; Julia; HR; Manager; 1", "4; Sophie; Marketing; Manager; 1", "7; Wickey; Marketing; Sales Rep; 2", "9; Wayne; HR; HR Supervisor; 2", "11; Mia; Dev; Developer; 2"}; String[][] dataSource = new String[data.length][]; for (int i = 0; i < data.length; i++) { dataSource[i] = data[i].split("[;]", -1); } //Set data as the table data table.setDataSource(dataSource); //Set the first row as header row table.getStyle().setHeaderSource(PdfHeaderSource.Rows); table.getStyle().setHeaderRowCount(1); //Show header(the header is hidden by default) table.getStyle().setShowHeader(true); //Set font color and background color of header row table.getStyle().getHeaderStyle().setBackgroundBrush(PdfBrushes.getGray()); table.getStyle().getHeaderStyle().setTextBrush(PdfBrushes.getWhite()); //Set text alignment in header row table.getStyle().getHeaderStyle().setStringFormat(new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle)); //Set text alignment in other cells for (int i = 0; i < table.getColumns().getCount(); i++) { table.getColumns().get(i).setStringFormat(new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle)); } //Register with BeginRowLayout event table.beginRowLayout.add(new BeginRowLayoutEventHandler() { public void invoke(Object sender, BeginRowLayoutEventArgs args) { Table_BeginRowLayout(sender, args); } }); //Draw table on the page table.draw(page, new Point2D.Float(0, 30)); //Save the document to a PDF file doc.saveToFile("output/PdfTable.pdf"); } //Event handler private static void Table_BeginRowLayout(Object sender, BeginRowLayoutEventArgs args) { //Set row height args.setMinimalHeight(20f); //Alternate color of rows except the header row if (args.getRowIndex() == 0) { return; } if (args.getRowIndex() % 2 == 0) { args.getCellStyle().setBackgroundBrush(PdfBrushes.getLightGray()); } else { args.getCellStyle().setBackgroundBrush(PdfBrushes.getWhite()); } } }
Create a Table in PDF Using PdfGrid Class
Below are the steps to create a table in PDF using the PdfGrid class using Spire.PDF for Java.
- Create a PdfDocument object.
- Add a page to it using PdfDocument.getPages().add() method.
- Create a PdfGrid object.
- Set the table style using the methods under the PdfGridStyle object which is returned by PdfGrid.getStyle() method.
- Add rows and columns to the table using PdfGrid.getRows().add() method and PdfGrid.getColumns().add() method.
- Insert data to specific cells using PdfGridCell.setValue() method.
- Span cells across columns or rows using PdfGridCell.setRowSpan() method or PdfGridCell.setColumnSpan() method.
- Set the formatting of a specific cell using PdfGridCell.setStringFormat() method and the methods under PdfGridCellStyle object.
- Draw table on the PDF page using PdfGrid.draw() method.
- Save the document to a PDF file using PdfDocument.saveToFile() method.
- Java
import com.spire.pdf.*; import com.spire.pdf.graphics.*; import com.spire.pdf.grid.PdfGrid; import com.spire.pdf.grid.PdfGridRow; import java.awt.*; import java.awt.geom.Point2D; public class CreateGrid { public static void main(String[] args) { //Create a PdfDocument object PdfDocument doc = new PdfDocument(); //Add a page PdfPageBase page = doc.getPages().add(PdfPageSize.A4,new PdfMargins(40)); //Create a PdfGrid PdfGrid grid = new PdfGrid(); //Set cell padding grid.getStyle().setCellPadding(new PdfPaddings(1, 1, 1, 1)); //Set font grid.getStyle().setFont(new PdfTrueTypeFont(new Font("Times New Roman", Font.PLAIN, 13), true)); //Add rows and columns PdfGridRow row1 = grid.getRows().add(); PdfGridRow row2 = grid.getRows().add(); PdfGridRow row3 = grid.getRows().add(); PdfGridRow row4 = grid.getRows().add(); grid.getColumns().add(4); //Set column width for (int i = 0; i < grid.getColumns().getCount(); i++) { grid.getColumns().get(i).setWidth(120); } //Write data into specific cells row1.getCells().get(0).setValue("Order and Payment Status"); row2.getCells().get(0).setValue("Order number"); row2.getCells().get(1).setValue("Date"); row2.getCells().get(2).setValue ("Customer"); row2.getCells().get(3).setValue("Paid or not"); row3.getCells().get(0).setValue("00223"); row3.getCells().get(1).setValue("2022/06/02"); row3.getCells().get(2).setValue("Brick Lane Realty"); row3.getCells().get(3).setValue("Yes"); row4.getCells().get(0).setValue("00224"); row4.getCells().get(1).setValue("2022/06/03"); row4.getCells().get(3).setValue("No"); //Span cell across columns row1.getCells().get(0).setColumnSpan(4); //Span cell across rows row3.getCells().get(2).setRowSpan(2); //Set text alignment of specific cells row1.getCells().get(0).setStringFormat(new PdfStringFormat(PdfTextAlignment.Center)); row3.getCells().get(2).setStringFormat(new PdfStringFormat(PdfTextAlignment.Left, PdfVerticalAlignment.Middle)); //Set background color of specific cells row1.getCells().get(0).getStyle().setBackgroundBrush(PdfBrushes.getOrange()); row4.getCells().get(3).getStyle().setBackgroundBrush(PdfBrushes.getLightGray()); //Format cell border PdfBorders borders = new PdfBorders(); borders.setAll(new PdfPen(new PdfRGBColor(Color.ORANGE), 0.8f)); for (int i = 0; i < grid.getRows().getCapacity(); i++) { PdfGridRow gridRow = grid.getRows().get(i); gridRow.setHeight(20f); for (int j = 0; j < gridRow.getCells().getCount(); j++) { gridRow.getCells().get(j).getStyle().setBorders(borders); } } //Draw table on the page grid.draw(page, new Point2D.Float(0, 30)); //Save the document to a PDF file doc.saveToFile("output/PdfGrid.pdf"); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.