Spire.Doc
Wednesday, 25 April 2012 03:48

Extract Text from Word Document in C#, VB.NET

Written by support iceblue

Word text can be extracted from a document and save in other files (for example TXT file) for other usage. This guide will present a convenient solution to extract Word text in C# and VB.NET.

Generally speaking, a Word document must include a lot of contents, such as text, image, table etc. Some of these contents can be extracted and used in other documents or files. The following guide focuses on introducing how to extract text from Word document and save in a TXT file in C# and VB.NET via Spire.Doc for .NET. And the following screenshot presents parts of text extracted from Word.

Extract Word Text

Text is all saved in Paragraph of Section class Spire.Doc for .NET provides. Therefore, you must get section and paragraph of document firstly and then get text to extract. And the following steps present details about how to extract text.

Firstly, load document by invoking LoadFromFile method of Document class with parameter string fileName. Secondly, initialize a StringBuilder class instance for saving text which will be extracted next. Thirdly, use foreach statement to get each paragraph of each section in document and invoke AppendLine(Paragraph.Text) method of StringBuilder class to appends copy of all extracted string (text in all paragraphs) in the StringBuilder instance. Finally, invoke File.WriteAllText method with parameter string path, string contents to create a new file to save extracted text. Please use the code.

[C#]
using Spire.Doc;
using Spire.Doc.Documents;
using System.Text;
using System.IO;

namespace ExtractTextfromWord
{
    class ExtractText
    {
        static void Main(string[] args)
        {
            //Load Document
            Document document = new Document();
            document.LoadFromFile(@"E:\Work\Documents\WordDocuments\Spire.Doc for .NET.docx");

            //Initialzie StringBuilder Instance
            StringBuilder sb = new StringBuilder();

            //Extract Text from Word and Save to StringBuilder Instance
            foreach (Section section in document.Sections)
            {
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    sb.AppendLine(paragraph.Text);
                }
            }

            //Create a New TXT File to Save Extracted Text
            File.WriteAllText("Extract.txt", sb.ToString());
            System.Diagnostics.Process.Start("ExtractText.txt");
        }
    }
}
[VB.NET]
Imports Spire.Doc
Imports Spire.Doc.Documents
Imports System.Text
Imports System.IO

Namespace ExtractTextfromWord
    Friend Class ExtractText
        Shared Sub Main(ByVal args() As String)
            'Load Document
            Dim document As New Document()
            document.LoadFromFile("E:\Work\Documents\WordDocuments\Spire.Doc for .NET.docx")

            'Initialzie StringBuilder Instance
            Dim sb As New StringBuilder()

            'Extract Text from Word and Save to StringBuilder Instance
            For Each section As Section In document.Sections
                For Each paragraph As Paragraph In section.Paragraphs
                    sb.AppendLine(paragraph.Text)
                Next paragraph
            Next section

            'Create a New TXT File to Save Extracted Text
            File.WriteAllText("Extract.txt", sb.ToString())
            System.Diagnostics.Process.Start("ExtractText.txt")
        End Sub
    End Class
End Namespace

Spire.Doc, the professional stand-alone component to manipulate MS Word document without automation, enables developers to generate, read, write, modify Word document on their .NET, WPF and Silverlight application.

Wednesday, 18 April 2012 06:15

How to Convert HTML to Image

Written by support iceblue

Spire.Doc can help users use C#/VB.NET to convert HTML to Image. This solution enables users to read HTML at anytime and anywhere with their portable devices such as cell phones, MP4 players, PSP, iPad, iTouch, etc. Follow the simple steps below to use C#/VB.NET to convert HTM to image. Download Spire.Doc Here

Friendly Reminder: Please make sure Spire.Doc and Visual Studio are correctly installed on system.

Step 1: Create a C#/VB.NET project in visual studio add Spire.Doc.dll as reference. The default setting of Spire.Doc.dll is placed under "C:\Program Files\e-iceblue\Spire.Doc\Bin". Select assembly Spire.Doc.dll and click OK to add it to the project.

Step 2: Add a "Button" to Form1. Double click the button, add the following codes in the top of file.

[C#]
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Drawing;
using System.Drawing.Imaging;
using Spire.Doc;
using Spire.Doc.Documents;

namespace HTML2Image
{
    class Program
    {
        static void Main(string[] args)
        {
        }
    }
}
[VB.NET]
Imports System.Collections.Generic
Imports System.Linq
Imports System.Text
Imports System.Drawing
Imports System.Drawing.Imaging
Imports Spire.Doc
Imports Spire.Doc.Documents

Namespace HTML2Image
	Class Program
		Private Shared Sub Main(args As String())
		End Sub
	End Class
End Namespace

Step 3: Use the code below to load HTML file which we will convert to image.

[C#]
            Document document = new Document();
            document.LoadFromFile(@"D:\test.html", FileFormat.Html, XHTMLValidationType.None);
[VB.NET]
Dim document As New Document()
document.LoadFromFile("D:\test.html", FileFormat.Html, XHTMLValidationType.None)

Step 4: Spire.Doc presents an easy solution to convert HTML to image. The following code can help users easily convert HTML to image. Spire.Doc enables users to convert HTMl to BMP, JPEG, PNG, GIF, Tiff,etc.

[C#]
            Image image = document.SaveToImages(0, ImageType.Bitmap);
            image.Save("Sample.png", ImageFormat.Png);
[VB.NET]
Dim image As Image = document.SaveToImages(0, ImageType.Bitmap)
image.Save("Sample.png", ImageFormat.Png)

HTML to Image

Press F5 to start the project and we can find the image in the project folder, bin → debug.

Spire.Doc can convert HTML to most of popular file formats. It can convert HTML to PDF, Word, XML, RTF, Text, ePub, etc. Click to learn more

Thursday, 12 April 2012 06:08

Word Table for Silverlight

Written by support iceblue

The sample demonstrates how to Create Table in Word for Silverlight via Spire.Doc.