News Category

Find Text in PDF by Regular Expression in Java

2020-08-24 06:58:42 Written by  support iceblue
Rate this item
(0 votes)

This article demonstrates how to find the text that matches a specific regular expression in a PDF document using Spire.PDF for Java.

import com.spire.pdf.general.find.PdfTextFind;
import java.awt.*;

public class FindByRegularExpression {

    public static void main(String[] args) throws Exception {

        //Load a PDF document
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("C:\\Users\\Administrator\\Desktop\\test.pdf");

        //Create a object of PdfTextFind collection
        PdfTextFind[] results;

        //Loop through the pages
        for (Object page : (Iterable) pdf.getPages()) {
            PdfPageBase pageBase = (PdfPageBase) page;

            //Define a regular expression
            String pattern = "\\#\\w+\\b";

            //Find all results that match the pattern
            results = pageBase.findText(pattern).getFinds();

            //Highlight the search results with yellow
            for (PdfTextFind find : results) {
                find.applyHighLight(Color.yellow);
            }
        }

        //Save to file
        pdf.saveToFile("FindByPattern.pdf");
    }
}

Find Text in PDF by Regular Expression in Java

Additional Info

  • tutorial_title:
Last modified on Thursday, 02 September 2021 06:36