Java: Extract Attachments from PDF Documents

A large number of users today preserve different files within PDF documents as attachments. These attachments can be extracted and used for other purposes when necessary. Basically, there are two types of attachments in PDF: document level attachment and annotation attachment. Below are the differences between them.

  • Document Level Attachment (represented by PdfAttachment class): A file attached to a PDF at the document level won't appear on a page, but can only be viewed in the "Attachments" panel of a PDF reader.
  • Annotation Attachment (represented by PdfAttachmentAnnotation class): A file will be added to a specific position of a page. Annotation attachments are shown as a paper clip icon on the page; reviewers can double-click the icon to open the file.

In this article, you will learn how to extract these two kinds of attachments from a PDF document in Java using Spire.PDF for Java.

Install Spire.PDF for Java

First, you need to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>10.3.4</version>
    </dependency>
</dependencies>
    

Extract Attachments from PDF in Java

The document level attachments of a PDF document can be obtained using PdfDocument.getAttachments() method. The following steps show you how to extract attachments and save them to a local folder.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Get the attachment collection from the document using PdfDocument.getAttachments() method.
  • Get a specific attachment using PdfAttachmentCollection.get() method and get its data using PdfAttachment.getData() method. Write the data to a file and save to a specified folder.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.annotations.*;
import com.spire.pdf.attachments.PdfAttachmentCollection;

import java.io.*;

public class ExtractAttachments {

    public static void main(String[] args) throws Exception {

        //Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        //Load a PDF file that contains attachments
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Attachments.pdf");

        //Get the attachment collection of the PDF document
        PdfAttachmentCollection attachments = doc.getAttachments();

        //Loop through the collection
        for (int i = 0; i < attachments.getCount(); i++) {

        //Specify the output file path and name
        File file = new File("C:\\Users\\Administrator\\Desktop\\output\\" + attachments.get(i).getFileName());
        OutputStream output = new FileOutputStream(file);
        BufferedOutputStream bufferedOutput = new BufferedOutputStream(output);

        //Get a specific attachment and write to file
        bufferedOutput.write(attachments.get(i).getData());
        bufferedOutput.close();
        }
    }
}

Java: Extract Attachments from PDF Documents

Extract Annotation Attachments from PDF in Java

Annotation attachment is a page-based element. To get annotations from a specific page, use PdfPageBase.getAnnotationsWidget() method. After that, you'll need to determine if a specific annotation is an annotation attachment. The follows are the steps to extract annotation attachments from a whole document and save them to a local folder.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Get a specific page from the document using PdfDocument.getPages().get() method.
  • Get the annotation collection from the page using PdfPageBase.getAnnotationsWidget() method.
  • Determine if a specific annotation is an instance of PdfAttachmentAnnotationWidget. If yes, write the annotation attachment to a file and save it to a specified folder.
  • Java
import com.spire.pdf.PdfDocument;
import com.spire.pdf.annotations.PdfAnnotationCollection;
import com.spire.pdf.annotations.PdfAttachmentAnnotationWidget;

import java.io.*;

public class ExtractAnnotationAttachments {

    public static void main(String[] args) throws Exception {

        //Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        //Load a PDF file that contains attachments
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\AnnotationAttachments.pdf");

        //Loop through the pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {

        //Get the annotation collection
        PdfAnnotationCollection collection = doc.getPages().get(i).getAnnotationsWidget();

        //Loop through the annotations
        for (Object annotation : collection) {

        //Determine if an annotation is an instance of PdfAttachmentAnnotationWidget
        if (annotation instanceof PdfAttachmentAnnotationWidget) {

        //Save the annotation attachment out of the document
        String fullPath = ((PdfAttachmentAnnotationWidget) annotation).getFileName();
        String fileName = fullPath.substring(fullPath.lastIndexOf("\\") + 1);
        File file = new File("C:\\Users\\Administrator\\Desktop\\output\\" + fileName);
        OutputStream output = new FileOutputStream(file);
        BufferedOutputStream bufferedOutput = new BufferedOutputStream(output);
        bufferedOutput.write(((PdfAttachmentAnnotationWidget) annotation).getData());
        bufferedOutput.close();
                }
            }
        }
    }
}

Java: Extract Attachments from PDF Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.