C# Read Content from a Word Document

Reading content from a Word document is crucial for many work and study tasks. Reading a page from a Word document helps in quickly browsing and summarizing key information, reading a section from a Word document aids in gaining a deeper understanding of a specific topic or section, while reading the entire document from a Word document allows for a comprehensive grasp of the overall information, facilitating comprehensive analysis and understanding. This article will introduce how to use Spire.Doc for .NET to read a page, a section, and the entire content of a Word document in a C# project.

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.Doc

Read a Page from a Word Document in C#

By using the FixedLayoutDocument class and FixedLayoutPage class, you can easily retrieve the content of a specified page. To facilitate viewing the extracted content, this sample code will store the read content in a new Word document. The detailed steps are as follows:

  • Create a Document object.
  • Load a Word document using the Document.LoadFromFile() method.
  • Create a FixedLayoutDocument object.
  • Retrieve the FixedLayoutPage object of a page in the document.
  • Access the Section where the page is located through the FixedLayoutPage.Section property.
  • Get the index position of the first paragraph on the page within the section.
  • Get the index position of the last paragraph on the page within the section.
  • Create another Document object.
  • Add a new section using Document.AddSection().
  • Clone the properties of the original section to the new section using the Section.CloneSectionPropertiesTo(newSection) method.
  • Copy the content of the page from the original document to the new document.
  • Save the resulting document using the Document.SaveToFile() method.
  • C#
using Spire.Doc;
using Spire.Doc.Pages;
using Spire.Doc.Documents;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a new document object
            Document document = new Document();

            // Load document content from the specified file
            document.LoadFromFile("Sample.docx");

            // Create a fixed layout document object
            FixedLayoutDocument layoutDoc = new FixedLayoutDocument(document);

            // Get the first page
            FixedLayoutPage page = layoutDoc.Pages[0];

            // Get the section where the page is located
            Section section = page.Section;

            // Get the first paragraph of the page
            Paragraph paragraphStart = page.Columns[0].Lines[0].Paragraph;
            int startIndex = 0;
            if (paragraphStart != null)
            {
                // Get the index of the paragraph in the section
                startIndex = section.Body.ChildObjects.IndexOf(paragraphStart);
            }

            // Get the last paragraph of the page
            Paragraph paragraphEnd = page.Columns[0].Lines[page.Columns[0].Lines.Count - 1].Paragraph;

            int endIndex = 0;
            if (paragraphEnd != null)
            {
                // Get the index of the paragraph in the section
                endIndex = section.Body.ChildObjects.IndexOf(paragraphEnd);
            }

            // Create a new document object
            Document newdoc = new Document();

            // Add a new section
            Section newSection = newdoc.AddSection();

            // Clone the properties of the original section to the new section
            section.CloneSectionPropertiesTo(newSection);

            // Copy the content of the page from the original document to the new document
            for (int i = startIndex; i <= endIndex ; i++)
            {
                newSection.Body.ChildObjects.Add(section.Body.ChildObjects[i].Clone());
            }

            // Save the new document to a specified file
            newdoc.SaveToFile("ReadOnePageContent.docx", Spire.Doc.FileFormat.Docx);

            // Close and release the new document
            newdoc.Close();
            newdoc.Dispose();

            // Close and release the original document
            document.Close();
            document.Dispose();
        }
    }
}

C# Read Content from a Word Document

Read a Section from a Word Document in C#

By using Document.Sections[index], you can retrieve a specific Section object that contains the header, footer, and body content. This example provides a simple way to copy all content of a section to another document. The detailed steps are as follows:

  • Create a Document object.
  • Use the Document.LoadFromFile() method to load a Word document.
  • Use Document.Sections[1] to retrieve the second section of the document.
  • Create another new Document object.
  • Use the Document.CloneDefaultStyleTo(newdoc) method to clone the default style of the original document to the new document.
  • Use newdoc.Sections.Add(section.Clone()) to clone the content of the second section of the original document into the new document.
  • Use the Document.SaveToFile() method to save the resulting document.
  • C#
using Spire.Doc;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
             // Create a new document object
            Document document = new Document();

            // Load a Word document from a file
            document.LoadFromFile("Sample.docx");

            // Get the second section of the document
            Section section = document.Sections[1];

            // Create a new document object
            Document newdoc = new Document();

            // Clone the default style to the new document
            document.CloneDefaultStyleTo(newdoc);

            // Clone the second section to the new document
            newdoc.Sections.Add(section.Clone());

            // Save the new document to a file
            newdoc.SaveToFile("ReadOneSectionContent.docx", Spire.Doc.FileFormat.Docx);

            // Close and release the new document object
            newdoc.Close();
            newdoc.Dispose();

            // Close and release the original document object
            document.Close();
            document.Dispose();
        }
    }
}

C# Read Content from a Word Document

Read the Entire Content from a Word Document in C#

This example demonstrates reading the entire content of a document by iterating through each section of the original document and cloning each section into a new document. The detailed steps are as follows:

  • Create a Document object.
  • Use the Document.LoadFromFile() method to load a Word document.
  • Create another new Document object.
  • Use the Document.CloneDefaultStyleTo(newdoc) method to clone the default style of the original document to the new document.
  • Iterate through each section of the original document using a foreach loop and clone each section into the new document.
  • Use the Document.SaveToFile() method to save the resulting document.
  • C#
using Spire.Doc;

namespace SpireDocDemo
{
    internal class Program
    {
        static void Main(string[] args)
        {
           // Create a new document object
            Document document = new Document();

            // Load a Word document from a file
            document.LoadFromFile("Sample.docx");

            // Create a new document object
            Document newdoc = new Document();

            // Clone the default style to the new document
            document.CloneDefaultStyleTo(newdoc);

            // Iterate through each section in the original document and clone it to the new document
            foreach (Section sourceSection in document.Sections)
            {
                newdoc.Sections.Add(sourceSection.Clone());
            }

            // Save the new document to a file
            newdoc.SaveToFile("ReadEntireDocumentContent.docx", Spire.Doc.FileFormat.Docx);

            // Close and release the new document object
            newdoc.Close();
            newdoc.Dispose();

            // Close and release the original document object
            document.Close();
            document.Dispose();
        }
    }
}

C# Read Content from a Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.