Extract Text from Word Document in C#, VB.NET

Word text can be extracted from a document and save in other files (for example TXT file) for other usage. This guide will present a convenient solution to extract Word text in C# and VB.NET.

Generally speaking, a Word document must include a lot of contents, such as text, image, table etc. Some of these contents can be extracted and used in other documents or files. The following guide focuses on introducing how to extract text from Word document and save in a TXT file in C# and VB.NET via Spire.Doc for .NET. And the following screenshot presents parts of text extracted from Word.

Extract Word Text

Text is all saved in Paragraph of Section class Spire.Doc for .NET provides. Therefore, you must get section and paragraph of document firstly and then get text to extract. And the following steps present details about how to extract text.

Firstly, load document by invoking LoadFromFile method of Document class with parameter string fileName. Secondly, initialize a StringBuilder class instance for saving text which will be extracted next. Thirdly, use foreach statement to get each paragraph of each section in document and invoke AppendLine(Paragraph.Text) method of StringBuilder class to appends copy of all extracted string (text in all paragraphs) in the StringBuilder instance. Finally, invoke File.WriteAllText method with parameter string path, string contents to create a new file to save extracted text. Please use the code.

[C#]
using Spire.Doc;
using Spire.Doc.Documents;
using System.Text;
using System.IO;

namespace ExtractTextfromWord
{
    class ExtractText
    {
        static void Main(string[] args)
        {
            //Load Document
            Document document = new Document();
            document.LoadFromFile(@"E:\Work\Documents\WordDocuments\Spire.Doc for .NET.docx");

            //Initialzie StringBuilder Instance
            StringBuilder sb = new StringBuilder();

            //Extract Text from Word and Save to StringBuilder Instance
            foreach (Section section in document.Sections)
            {
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    sb.AppendLine(paragraph.Text);
                }
            }

            //Create a New TXT File to Save Extracted Text
            File.WriteAllText("Extract.txt", sb.ToString());
            System.Diagnostics.Process.Start("ExtractText.txt");
        }
    }
}
[VB.NET]
Imports Spire.Doc
Imports Spire.Doc.Documents
Imports System.Text
Imports System.IO

Namespace ExtractTextfromWord
    Friend Class ExtractText
        Shared Sub Main(ByVal args() As String)
            'Load Document
            Dim document As New Document()
            document.LoadFromFile("E:\Work\Documents\WordDocuments\Spire.Doc for .NET.docx")

            'Initialzie StringBuilder Instance
            Dim sb As New StringBuilder()

            'Extract Text from Word and Save to StringBuilder Instance
            For Each section As Section In document.Sections
                For Each paragraph As Paragraph In section.Paragraphs
                    sb.AppendLine(paragraph.Text)
                Next paragraph
            Next section

            'Create a New TXT File to Save Extracted Text
            File.WriteAllText("Extract.txt", sb.ToString())
            System.Diagnostics.Process.Start("ExtractText.txt")
        End Sub
    End Class
End Namespace

Spire.Doc, the professional stand-alone component to manipulate MS Word document without automation, enables developers to generate, read, write, modify Word document on their .NET, WPF and Silverlight application.