Extracting text and images is a common requirement while working with Word documents. This will help you to save useful content out of the original document to re-use in a new document or for other purposes. In this article, you will learn how to extract text or images from a Word document using Spire.Doc for C++.
Install Spire.Doc for C++
There are two ways to integrate Spire.Doc for C++ into your application. One way is to install it through NuGet, and the other way is to download the package from our website and copy the libraries into your program. Installation via NuGet is simpler and more recommended. You can find more details by visiting the following link.
Integrate Spire.Doc for C++ in a C++ Application
Extract Text from a Word Document in C++
To extract the text content from an existing Word document, Spire.Doc for C++ provides the Document->GetText() method. The following are steps to extract text and save in a TXT file.
- Create a Document instance.
- Load a sample Word document using Document->LoadFromFile() method.
- Get text from the document using Document->GetText() method.
- Create a new txt file and write the extracted text to the file.
- C++
#include "Spire.Doc.o.h" using namespace Spire::Doc; int main() { //Specify input file path and name std::wstring data_path = L"Data\\"; std::wstring inputFile = data_path + L"input.docx"; //Specify output file path and name std::wstring outputPath = L"Output\\"; std::wstring outputFile = outputPath + L"GetText.txt"; //Create a Document instance Document* document = new Document(); //Load a sample Word document from disk document->LoadFromFile(inputFile.c_str()); //Get text from the document std::wstring text = document->GetText(); //Create a new TXT File to save the extracted text std::wofstream write(outputFile); write << text; write.close(); document->Close(); delete document; }
Extract Images from a Word Document in C++
For a Word document with a lot of images, manually saving these images one by one is quite time-consuming. Below are steps to extract all images at once using Spire.Doc for C++.
- Load a sample Word document using Document->LoadFromFile() method.
- Append the document to the end of the deque, and then create a vector of images list.
- Traverse through all child objects of the document.
- Determine whether the object type is picture. If yes, get each image using DocPicture->GetImage() method and add it to the list.
- Save the extracted images out of the document in a specified output file path.
- C++
#include "Spire.Doc.o.h" #include <deque> using namespace Spire::Doc; int main() { //Specify input file path and name std::wstring data_path = L"Data\\"; std::wstring inputFile = data_path + L"input.docx"; //Specify output file path and name std::wstring outputPath = L"Output\\"; std::wstring outputFile = outputPath + L"ExtractImage/"; //Load a sample Word document Document* document = new Document(); document->LoadFromFile(inputFile.c_str()); //Append the document to the end of the deque std::deque<ICompositeObject*> nodes; nodes.push_back(document); //Create a vector of images list std::vector<Image*> images; //Traverse through all child objects of the document while (nodes.size() > 0) { ICompositeObject* node = nodes.front(); nodes.pop_front(); for (int i = 0; i < node->GetChildObjects()->GetCount(); i++) { IDocumentObject* child = node->GetChildObjects()->GetItem(i); //Get each image and add it to the list if (child->GetDocumentObjectType() == DocumentObjectType::Picture) { DocPicture* picture = dynamic_cast<DocPicture*>(child); images.push_back(picture->GetImage()); } else if (dynamic_cast<ICompositeObject*>(child) != nullptr) { nodes.push_back(dynamic_cast<ICompositeObject*>(child)); } } } //Save the images out of the document for (int i = 0; i < images.size(); i++) { std::wstring fileName = L"Image-" + std::to_wstring(i) + L".png"; images[i]->Save((outputFile + fileName).c_str(), ImageFormat::GetPng()); } document->Close(); delete document; }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.