C#/VB.NET: Extract Text and Images from Word

When receiving or downloading a Word document from the Internet, you may sometimes need to extract content from the document for other purposes. In this article, you will learn how to programmatically extract text and images from a Word document using Spire.Doc for .NET.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Extract Text from a Word Document

Below are detailed steps on how to extract text from a Word document and save in a TXT file.

  • Create a Document instance.
  • Load a sample Word document using Document.LoadFromFile() method.
  • Create a StringBuilder instance.
  • Get each paragraph of each section in the document.
  • Get the text of a specified paragraph using Paragraph.Text property, and then append the extracted text to the StringBuilder instance using StringBuilder.AppendLine() method.
  • Create a new txt file and write the extracted text to the file using File.WriteAllText() method.
  • C#
  • VB.NET
using Spire.Doc;
using Spire.Doc.Documents;
using System.Text;
using System.IO;

namespace ExtractTextfromWord
{
    class ExtractText
    {
        static void Main(string[] args)
        {
            //Create a Document instance
            Document document = new Document();

            //Load a sample Word document 
            document.LoadFromFile("input.docx");

            //Create a StringBuilder instance
            StringBuilder sb = new StringBuilder();

            //Extract text from Word and save to StringBuilder instance
            foreach (Section section in document.Sections)
            {
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    sb.AppendLine(paragraph.Text);
                }
            }

            //Create a new txt file to save the extracted text
            File.WriteAllText("Extract.txt", sb.ToString());
        }
    }
}

C#/VB.NET: Extract Text and Images from Word

Extract Images from a Word Document

Below are detailed steps on how to extract all images from a Word document.

  • Create a Document instance and load a sample Word document.
  • Get each paragraph of each section in the document.
  • Get each document object of a specific paragraph.
  • Determine whether the document object type is picture. If yes, save the image out of the document using DocPicture.Image.Save(String, ImageFormat) method.
  • C#
  • VB.NET
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using System;

namespace ExtractImage
{
    class Program
    {
        static void Main(string[] args)
        {
            //Load a Word document
            Document document = new Document("input.docx");
            int index = 0;

            //Get each section of document
            foreach (Section section in document.Sections)
            {
                //Get each paragraph of section
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    //Get each document object of a specific paragraph
                    foreach (DocumentObject docObject in paragraph.ChildObjects)
                    {
                        //If the DocumentObjectType is picture, save it out of the document
                        if (docObject.DocumentObjectType == DocumentObjectType.Picture)
                        {
                            DocPicture picture = docObject as DocPicture;
                            picture.Image.Save(string.Format("image_{0}.png", index), System.Drawing.Imaging.ImageFormat.Png);
                            index++;
                        }
                    }
                }
            }
        }
    }
}

C#/VB.NET: Extract Text and Images from Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.