Java: Convert PDF to HTML

PDF file format makes the presentation of documents consistent across devices. However, when you need to put PDF documents on web pages, it's better to convert them to HTML files. In this way, all the content of your document can be displayed in the browser directly, with no need for downloading files. And the loading of large PDF documents takes a long time, while HTML files can be rendered in the browser very quickly. In addition, compared to PDF files, it is much easier for search engines to crawl HTML web pages to get information, which will give your website more exposure. This article will show how to convert PDF documents into HTML files in Java using Spire.PDF for Java.

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>10.3.4</version>
    </dependency>
</dependencies>
    

Convert a PDF document to an HTML file in Java

The conversion from a PDF document to an HTML file can be directly done by loading a PDF document and saving it as an HTML file using PdfDocument.saveToFile(String filename, FileFormat.HTML) method provided by Spire.PDF for Java. The detailed steps are as follows.

  • Create an object of PdfDocument.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Save the PDF file as an HTML file using PdfDocument.saveToFle() method.
  • Java
Java
import com.spire.pdf.*;

public class convertPDFToHTML {
    public static void main(String[] args) {

        //Create an object of PdfDocument
        PdfDocument pdf = new PdfDocument();

        //Load a PDF file
        pdf.loadFromFile("C:/Guide to a Foreign Past.pdf");

        //Save the PDF file as an HTML file
        pdf.saveToFile("PDFToHTML.html",FileFormat.HTML);
        pdf.close();
    }
}

Convert a PDF document to an HTML file with SVG Embedded

Spire.PDF for Java also provides the PdfDocument.getConvertOptions().setPdfToHtmlOptions(true) method to enable embedding SVG while converting. The detailed steps for converting a PDF file to an HTML file with SVG embedded are as follows.

  • Create an object of PdfDocument.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Enable embedding SVG using PdfDocument.getConvertOptions().setPdfToHtmlOptions(true) method.
  • Save the PDF file as an HTML file using PdfDocument.saveToFle() method.
  • Java
import com.spire.pdf.*;

public class convertPDFToHTMLEmbeddingSVG {
    public static void main(String[] args) {

        //Create an object of PdfDocument
        PdfDocument doc = new PdfDocument();

        //Load a PDF file
        doc.loadFromFile("C:/Guide to a Foreign Past.pdf");

        //Set embedding SVG
        doc.getConvertOptions().setPdfToHtmlOptions(true);

        //Save the PDF file as an HTML file
        doc.saveToFile("PDFToHTMLEmbeddingSVG.html", FileFormat.HTML);
        doc.close();
    }
}

Convert a PDF document to HTML Stream in Java

Spire.PDF for Java also supports converting PDF documents to HTML stream. The detailed steps are as follows.

  • Create an object of PdfDocument.
  • Load a PDF file using PdfDocument.loadFromFile() method.
  • Save the PDF file as HTML stream using PdfDocument.saveToStream() method.
  • Java
import com.spire.pdf.*;

import java.io.*;


public class convertPDFToHTMLStream {
    public static void main(String[] args) throws FileNotFoundException {

        //Create an object of PdfDocument
        PdfDocument pdf = new PdfDocument();

        //Load a PDF file
        pdf.loadFromFile("C:/Guide to a Foreign Past.pdf");

        //Save the PDF file as HTML stream
        File outFile = new File("PDFToHTMLStream.html");
        OutputStream outputStream = new FileOutputStream(outFile);
        pdf.saveToStream(outputStream, FileFormat.HTML);
        pdf.close();
    }
}

Java: Convert PDF to HTML

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.