Java: Extract Images from a PDF Document

2022-03-22 07:09:00 Written by jie zou

font size decrease font size increase font size
Print
E-mail

Rate this item

(3 votes)

If you'd like to use the images embedded in a PDF document elsewhere, you can extract and save them in a folder. This article will show you how to programmatically extract images from a PDF document using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>10.4.4</version>
    </dependency>
</dependencies>

Extract Images from a PDF Document

Spire.PDF for Java offers the PdfPageBase.extractImages() method to extract images from a PDF document. The detailed steps are listed below.

Create a PdfDocument instance and load a PDF sample document using PdfDocument.loadFromFile() method.
Loop through all pages of the document and extract images from the given page using PdfPageBase.extractImages() method.
Specify the path and name of the output document.
Save images as .png files.

Java

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;

public class ExtractImage {
    public static void main(String[] args) throws IOException {
        //create a PdfDocument instance
        PdfDocument doc = new PdfDocument();

        //load a PDF sample file
        doc.loadFromFile("sample.pdf");

        //declare an int variable
        int index = 0;

        //loop through all pages
        for (PdfPageBase page : (Iterable<PdfPageBase>) doc.getPages()) {

            //extract images  from the given page
            for (BufferedImage image : page.extractImages()) {

                //specify the file path and name
                File output = new File("C:\\Users\\Administrator\\Desktop\\ExtractedImages\\" + String.format("Image_%d.png", index++));

                //save images as .png files
              ImageIO.write(image, "PNG", output);
            }
        }
    }
}

Java: Extract Images from a PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

tutorial_title:

Last modified on Thursday, 27 July 2023 07:10

Read 1718 times

Published in Extract/Read

Tagged under

pdf java Extract Read

Social sharing

News Category