How to Extract Text from PDF Document in C#, VB.NET

In a PDF document, contents are often formed by text. If readers think that contents are useful for them or can be takes as template, they may need to extract text from PDF and save as other format document.

Spire.PDF provides users with function to extract text from PDF document and save text as txt. This program guides demonstrates method about how to extract text by using C#/VB.NET via Spire.PDF easily and quickly.

Step 1: Load PDF Document

Declare a new PDF document and then use document.LoadFromFile() method to get document which we want to extract text. The parameter passed to this method is file name string.

[C#]
            PdfDocument document = new PdfDocument();
            document.LoadFromFile(@"E:\work\C pointer.pdf");
[VB.NET]
            Dim document As New PdfDocument()
            document.LoadFromFile("E:\work\A Day to Remember.pdf")

Step 2: Extract Text from PDF

Declare a new StringBuilder content, which represents a mutable string of characters. Then, append extracted text from PDF in StringBuilder by using content.Append() method. In this example, text is extracted from the first page.

[C#]
            StringBuilder content = new StringBuilder();
            content.Append(document.Pages[0].ExtractText());
[VB.NET]
            Dim content As New StringBuilder()
            content.Append(document.Pages(2).ExtractText())

Step 3: Save and Launch Extracted Text

Define a file name string. Then, use File.WriteAllText() method to create a new file and write specified string in it then close file. The parameters passed to it are file name string and content string. Finally, launch this saved file.

[C#]
            String fileName = "TextFromPDF.txt";
            File.WriteAllText(fileName, content.ToString());
            System.Diagnostics.Process.Start("TextFromPDF.txt");
[VB.NET]
            Dim fileName As String = "TextFromPDF.txt"
            File.WriteAllText(fileName, content.ToString())
            System.Diagnostics.Process.Start("TextFromPDF.txt")

Note: If you want to extract text from all pages, please use the following sentences.

[C#]
            foreach (PdfPageBase page in document.Pages)
            {
            content.Append(page.ExtractText());
            }
[VB.NET]
            For Each page As PdfPageBase In document.Pages
                content.Append(page.ExtractText())
            Next page

Effective Screeshot:

Spire.PDF is a PDF document creation component that enables your .NET applications to read, write and manipulate PDF documents without using Adobe Acrobat. Now, the new version added Silverlight platform which makes it more powerful.