News Category

Read PDF Images and Text in C#/VB.NET

2011-06-24 05:28:41 Written by  support iceblue
Rate this item
(0 votes)

This section aims at introducing a solution to read PDF via a .net PDF component in C#, VB.NET. In this solution, you can realize your PDF read task in only three steps. Besides, you can save the extracted image to be many commonly used formats such as jpg, jpeg, png, bmp, tiff, gif etc. Some special texts such as texts which are written from right to left for example Herbrew, Arabic etc all can be read from PDF.

Spire.PDF for .NET, a professional .NET PDF component for reading, editing and manipulating PDF file, enables you to read your PDF file in a fast way. First, you can implement PdfDocument.LoadFromFile(string filename) method to load your PDF file from system.Then, please call the methods ExtractText and ExtractImages to extract PDF txt and images. System.IO.File.WriteAllText(string path, string contents) and Image.Save(string filename, ImageFormat format) , you can save the extracted text and images respectively.Please download Spire.PDF for .NET and view below picture:

Read PDF Text and Images

Detail Code:

[C#]
//Create a pdf document.
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(@"..\Sample_image.pdf");
StringBuilder buffer = new StringBuilder();
IList<Image> images = new List<Image>();
foreach (PdfPageBase page in doc.Pages)
{
    buffer.Append(page.ExtractText());
    foreach (Image image in page.ExtractImages())
    {
        images.Add(image);
    }
}

doc.Close();
//save text
String fileName = "TextInPdf.txt";
File.WriteAllText(fileName, buffer.ToString());
//save image
int index = 0;
foreach (Image image in images)
{
    String imageFileName
        = String.Format("Image-{0}.png", index++);
    image.Save(imageFileName, ImageFormat.Png);
}
//Launching the Text file.
System.Diagnostics.Process.Start(fileName);
[VB.NET]
'Create a pdf document.
Dim doc As New PdfDocument()
doc.LoadFromFile("..\Sample_image.pdf")
Dim buffer As New StringBuilder()
Dim images As IList(Of Image) = New List(Of Image)()
For Each page As PdfPageBase In doc.Pages
    buffer.Append(page.ExtractText())
    For Each image As Image In page.ExtractImages()
        images.Add(image)
    Next image
Next page

doc.Close()
'save text
Dim fileName As String = "TextInPdf.txt"
File.WriteAllText(fileName, buffer.ToString())
'save image
Dim index As Integer = 0
For Each image As Image In images
    Dim imageFileName As String = String.Format("Image-{0}.png", index)
    index += 1
    image.Save(imageFileName, ImageFormat.Png)
Next image
'Launching the Text file.
Process.Start(fileName)

Spire.PDF is a .NET PDF component, which enables users to perform a wide range of PDF document processing tasks directly, such as generate, read, write and modify PDF document in WPF, .NET and Silverlight.

Additional Info

  • tutorial_title: Read PDF Images and Text
Last modified on Thursday, 24 November 2022 08:52