Thursday, 13 February 2014 08:56

Extract Text from Text Boxes in Word Document

Written by  Administrator
Rate this item
(0 votes)

A text box's purpose is to allow the user to input text information to be used by the program. Also the existing text information can be extracted from the text box. The following guide focuses on introducing how to extract text from text box in a Word document in C# via Spire.Doc for .NET.

Firstly, check out the text box information in the word document.

extract text from textbox

Secondly, download Spire.Doc and install on your system. The Spire.Doc installation is clean, professional and wrapped up in a MSI installer.

Then adds Spire.Doc.dll as reference in the downloaded Bin folder though the below path: "..\Spire.Doc\Bin\NET4.0\ Spire.Doc.dll".

Now it comes to the steps of how to extract text from text boxes.

Step 1: Load a word document from the file.

[C#]
Document document = new Document();
document.LoadFromFile(@"..\..\Test.docx");

Step 2: Check whether text box exists in the documents.

[C#]
//Verify whether the document contains a textbox or not
if (document.TextBoxes.Count > 0)

Step 3: Initialize a StreamWriter class for saving text which will be extracted next

[C#]
using (StreamWriter sw = File.CreateText("result.txt"))

Step 4: Extracted the text from text boxes.

[C#]
//Traverse the document
foreach (Section section in document.Sections)
{
 foreach (Paragraph p in section.Paragraphs)
{
foreach (DocumentObject obj in p.ChildObjects)

//Extract text from paragraph in TextBox
if (objt.DocumentObjectType == DocumentObjectType.Paragraph)
{
  sw.Write((objt as Paragraph).Text)
 }
//Extract text from Table in TextBox
if (objt.DocumentObjectType == DocumentObjectType.Table)
 {
  Table table = objt as Table;
  ExtractTextFromTables(table, sw);
}
//Extract text from Table 
static void ExtractTextFromTables(Table table, StreamWriter sw)
{
for (int i = 0; i < table.Rows.Count; i++)
            {
                TableRow row = table.Rows[i];
                for (int j = 0; j < row.Cells.Count; j++)
                {
                    TableCell cell = row.Cells[j];
                    foreach (Paragraph paragraph in cell.Paragraphs)
                    {
                        sw.Write(paragraph.Text);
                    }
                }
            }
}

After debugging, the following result will be presented:

extract text from textbox

The full code:

[C#]
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Spire.Doc;
using Spire.Doc.Fields;
using System.IO;
using Spire.Doc.Documents;
namespace ExtractTextFromTextBoxes
{
    class Program
    {
        static void Main(string[] args)
        {
            Document document = new Document();
            document.LoadFromFile(@"..\..\Test.docx");

            //Verify whether the document contains a textbox or not
            if (document.TextBoxes.Count > 0)
            {
                using (StreamWriter sw = File.CreateText("result.txt"))
                {
                    foreach (Section section in document.Sections)
                    {
                        foreach (Paragraph p in section.Paragraphs)
                        {
                            foreach (DocumentObject obj in p.ChildObjects)
                            {
                                if (obj.DocumentObjectType == DocumentObjectType.TextBox)
                                {
                                    TextBox textbox = obj as TextBox;
                                    foreach (DocumentObject objt in textbox.ChildObjects)
                                    {
                                        if (objt.DocumentObjectType == DocumentObjectType.Paragraph)
                                        {
                                            sw.Write((objt as Paragraph).Text);
                                        }

                                        if (objt.DocumentObjectType == DocumentObjectType.Table)
                                        {
                                            Table table = objt as Table;
                                            ExtractTextFromTables(table, sw);
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
        static void ExtractTextFromTables(Table table, StreamWriter sw)
        {
            for (int i = 0; i < table.Rows.Count; i++)
            {
                TableRow row = table.Rows[i];
                for (int j = 0; j < row.Cells.Count; j++)
                {
                    TableCell cell = row.Cells[j];
                    foreach (Paragraph paragraph in cell.Paragraphs)
                    {
                        sw.Write(paragraph.Text);
                    }
                }
            }
        }
    }
}

Additional Info

  • tutorial_title: Extract Text from Text Boxes
Last modified on Monday, 14 July 2014 06:19