C#: Convert PDF to HTML

PDF documents have been a popular choice for sharing information due to their cross-platform compatibility and ability to preserve the original layout and formatting. However, as the web continues to evolve, there is an increasing demand for content that can be easily integrated into websites and other online platforms. In this context, converting PDF to HTML format has become highly valuable. By converting PDF files to more flexible and accessible HTML, users gain the ability to better utilize, share, and reuse PDF-based information within the web environment. In this article, we will demonstrate how to convert PDF files to HTML format in C# using Spire.PDF for .NET.

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.PDF

Convert PDF to HTML in C#

To convert a PDF document to HTML format, you can use the PdfDocument.SaveToFile(string fileName, FileFormat.HTML) method provided by Spire.PDF for .NET. The detailed steps are as follows.

  • Create an instance of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile(string fileName) method.
  • Save the PDF document to HTML format using the PdfDocument.SaveToFile(string fileName, FileFormat.HTML) method.
  • C#
using Spire.Pdf;

namespace ConvertPdfToHtml
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an instance of the PdfDocument class
            PdfDocument doc = new PdfDocument();

            // Load a PDF document
            doc.LoadFromFile("Sample.pdf");

            // Save the PDF document to HTML format
            doc.SaveToFile("PdfToHtml.html", FileFormat.HTML);
            doc.Close();
        }
    }
}

C#: Convert PDF to HTML

Set Conversion Options When Converting PDF to HTML in C#

The PdfConvertOptions.SetPdfToHtmlOptions() method allows you to customize the conversion options when transforming PDF files to HTML. This method takes several parameters that you can use to configure the conversion process, such as:

  • useEmbeddedSvg (bool): Indicates whether to embed SVG in the resulting HTML file.
  • useEmbeddedImg (bool): Indicates whether to embed images in the resulting HTML file. This option is applicable only when useEmbeddedSvg is set to false.
  • maxPageOneFile (int): Specifies the maximum number of pages to be included per HTML file. This option is applicable only when useEmbeddedSvg is set to false.
  • useHighQualityEmbeddedSvg (bool): Indicates whether to use high-quality embedded SVG in the resulting HTML file. This option is applicable when useEmbeddedSvg is set to true.

The following steps explain how to customize the conversion options when transforming a PDF to HTML using Spire.PDF for .NET.

  • Create an instance of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile(string fileName) method.
  • Get the PdfConvertOptions object using the PdfDocument.ConvertOptions property.
  • Set the PDF to HTML conversion options using PdfConvertOptions.SetPdfToHtmlOptions(bool useEmbeddedSvg, bool useEmbeddedImg, int maxPageOneFile, bool useHighQualityEmbeddedSvg) method.
  • Save the PDF document to HTML format using PdfDocument.SaveToFile(string fileName, FileFormat.HTML) method.
  • C#
using Spire.Pdf;

namespace ConvertPdfToHtmlWithCustomOptions
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an instance of the PdfDocument class
            PdfDocument doc = new PdfDocument();

            // Load a PDF document
            doc.LoadFromFile("Sample.pdf");

            // Set the conversion options to embed images in the resulting HTML and limit one page per HTML file
            PdfConvertOptions pdfToHtmlOptions = doc.ConvertOptions;
            pdfToHtmlOptions.SetPdfToHtmlOptions(false, true, 1, false);

            // Save the PDF document to HTML format
            doc.SaveToFile("PdfToHtmlWithCustomOptions.html", FileFormat.HTML);
            doc.Close();
        }
    }
}

Convert PDF to HTML Stream in C#

Instead of saving a PDF document to an HTML file, you can save it to an HTML stream by using the PdfDocument.SaveToStream(Stream stream, FileFormat.HTML) method. The detailed steps are as follows.

  • Create an instance of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile(string fileName) method.
  • Create an instance of the MemoryStream class.
  • Save the PDF document to an HTML stream using the PdfDocument.SaveToStream(Stream stream, FileFormat.HTML) method.
  • C#
using Spire.Pdf;
using System.IO;

namespace ConvertPdfToHtmlStream
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create an instance of the PdfDocument class
            PdfDocument doc = new PdfDocument();

            // Load a PDF document
            doc.LoadFromFile("Sample.pdf");

            // Save the PDF document to HTML stream
            using (var fileStream = new MemoryStream())
            {
                doc.SaveToStream(fileStream, FileFormat.HTML);

                // You can now do something with the HTML stream, such as Write it to a file
                using (var outputFile = new FileStream("PdfToHtmlStream.html", FileMode.Create))
                {
                    fileStream.Seek(0, SeekOrigin.Begin);
                    fileStream.CopyTo(outputFile);
                }
            }

            doc.Close();
        }
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.