Wednesday, 06 April 2011 09:47

PDF Extract in C#, VB.NET

The sample demonstrates how to extract images and text from PDF document.

(NO screenshot)

Published in Document Operation
Tuesday, 28 December 2010 07:50

Extract Images from Word in C#, VB.NET

Solution in this guide demonstrates how to extract images from an existing Word document and save them to a specified path in C# and VB.NET via Spire.Doc for .NET.

Image is one kind of document objects which belongs to paragraph items. Spire.Doc for .NET provides a DocumentObject class to store images in Document. And also provides a DocPicture class to get and set images of document. Download and Install Spire.Doc for .NET. Follow steps to extract images from Word.

  • Get each Paragraph of each Section in Document.
  • Get each DocumentObject of ChildObjects in Paragraph.
  • If the gotten DocumentObjectType is Picture, initialize a DocPicture class instance and assign the DocumentObject as value for this instance.
  • Initialize a String class instance to name extracted image instead of its original name by invoking String.Format(String format, object arg0)
  • Invoke DocPictrue.Image.Save(String, ImageFormat) method to save images.
            //Load document
            Document document = new Document(@"E:\Work\Documents\WordDocuments\Spire.Doc for .NET.docx");
            int index = 0;

            //Get Each Section of Document
            foreach (Section section in document.Sections)
                //Get Each Paragraph of Section
                foreach (Paragraph paragraph in section.Paragraphs)
                    //Get Each Document Object of Paragraph Items
                    foreach (DocumentObject docObject in paragraph.ChildObjects)
                        //If Type of Document Object is Picture, Extract.
                        if (docObject.DocumentObjectType == DocumentObjectType.Picture)
                            DocPicture picture = docObject as DocPicture;

                            //Name Image
                            String imageName = String.Format(@"images\Image-{0}.png", index);

                            //Save Image
                            picture.Image.Save(imageName, System.Drawing.Imaging.ImageFormat.Png);
            'Load document
            Dim document As New Document("E:\Work\Documents\WordDocuments\Spire.Doc for .NET.docx")
            Dim index As Integer = 0

            'Get Each Section of Document
            For Each section As Section In document.Sections
                'Get Each Paragraph of Section
                For Each paragraph As Paragraph In section.Paragraphs
                    'Get Each Document Object of Paragraph Items
                    For Each docObject As DocumentObject In paragraph.ChildObjects
                        'If Type of Document Object is Picture, Extract.
                        If docObject.DocumentObjectType = DocumentObjectType.Picture Then
                            Dim picture As DocPicture = TryCast(docObject, DocPicture)

                            'Name Image
                            Dim imageName As String = String.Format("images\Image-{0}.png", index)

                            'Save Image
                            picture.Image.Save(imageName, System.Drawing.Imaging.ImageFormat.Png)
                            index += 1
                        End If
                    Next docObject
                Next paragraph
            Next section

After debugging, all the extracted images are saved in a specified path. Open the directory, the images will be found.

Extract Word Image

Spire.Doc, an easy-to-use component to operate Word document, allows developers to fast generate, write, edit and save Word (Word 97-2003, Word 2007, Word 2010) in C# and VB.NET for .NET, Silverlight and WPF.

Published in Image and Shape