The seamless integration of document processing capabilities into web applications has become increasingly vital for enhancing user experience and streamlining workflows. For developers working within the React ecosystem, the ability to extract text from Word documents using JavaScript allows for the dynamic presentation of content, enabling users to easily import, edit, and interact with text data directly within a web interface. In this article, we will explore how to use Spire.Doc for JavaScript to extract text from Word documents in React applications.
- Extract All Text from a Word Document Using JavaScript
- Extract Text from Specific Sections or Paragraphs in a Word Document
- Extract Text from a Word Document Based on Paragraph Styles
Install Spire.Doc for JavaScript
To get started with extracting text from Word documents in a React application, you can either download Spire.Doc for JavaScript from our website or install it via npm with the following command:
npm i spire.office
The downloaded product package integrates Spire.Doc for JavaScript, Spire.XLS for JavaScript, Spire.PDF for JavaScript, and Spire.Presentation for JavaScript. To use the features of Spire.Doc for JavaScript, you need to copy the corresponding files (spire.doc.js, Spire.Doc.Wasm.zip, spire.common.js, Spire.Common.Wasm.zip, and the _framework folder) to the public folder of your project. To ensure proper text rendering, you can add relevant font files with a custom path. In the following example, the font is added to the path: public\static\font.
For more details, refer to the documentation: How to Integrate Spire.Doc for JavaScript in a React Project
Extract All Text from a Word Document Using JavaScript
To extract the complete text content from a Word document, Spire.Doc for JavaScript offers the Document.GetText() method. This method retrieves all the text in a document and returns it as a string, enabling efficient access to the content. The implementation steps are as follows:
- Load the spire.doc.js file to initialize the WebAssembly module.
- Load the Word file into the virtual file system using the window.spire.FetchFileToVFS method.
- Create a Document instance in the WebAssembly module using the new wasmModule.Document() method.
- Load the Word document into the Document instance with the Document.LoadFromFile() method.
- Retrieve the document's text as a string using the Document.GetText() method.
- Process the extracted text, such as downloading it as a text file or performing additional operations.
- JavaScript
import React, { useState, useEffect } from 'react';
function App() {
const [wasmModule, setWasmModule] = useState(null);
// Load Spire.Doc
useEffect(() => {
(async () => {
try {
const publicUrl = process.env.PUBLIC_URL || '';
const spireModule = await import(/* webpackIgnore: true */ `${publicUrl}/spire.doc.js`);
const rawModule = spireModule.default || spireModule;
window.wasmModule = typeof rawModule === 'function'
? await rawModule({ locateFile: p => p.endsWith('.wasm') ? `${publicUrl}/${p}` : p })
: rawModule;
setWasmModule(window.wasmModule);
} catch (error) {
console.error('Failed to load spire.doc.js WASM module:', error);
}
})();
}, []);
// Function to extract all text from a Word document
const ExtractAllTextFromWord = async () => {
const wasmModule = window.wasmModule.spiredoc;
if (wasmModule) {
// Load the font files into the virtual file system (VFS)
await window.spire.FetchFileToVFS('Arial.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/static/font/`);
// Specify the input file name and the output file name
const inputFileName = 'Sample.docx';
const outputFileName = 'ExtractWordText.txt';
// Fetch the input file and add it to the VFS
await window.spire.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/static/data/`);
// Create an instance of the Document class
const doc = new wasmModule.Document();
// Load the Word document
doc.LoadFromFile({ fileName: inputFileName });
// Get all text from the document
const documentText = doc.GetText();
// Release resources
doc.Dispose();
// Generate a Blob from the extracted text and trigger a download
const blob = new Blob([documentText], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = outputFileName;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
};
return (
<div style={{ textAlign: 'center', height: '300px' }}>
<h1>Extract All Text from Word Documents Using JavaScript in React</h1>
<button onClick={ExtractAllTextFromWord} disabled={!wasmModule}>
Convert and Download
</button>
</div>
);
}
export default App;

Extract Text from Specific Sections or Paragraphs in a Word Document
When only specific sections or paragraphs of a Word document are needed, Spire.Doc for JavaScript offers the Section.Paragraphs.get_Item(index).Text method to extract text from individual paragraphs. The following steps outline the process:
- Load the spire.doc.js file to initialize the WebAssembly module.
- Use the window.spire.FetchFileToVFS method to load the Word file into the virtual file system.
- Create a Document instance using the new wasmModule.Document() method.
- Load the Word document into the Document instance with the Document.LoadFromFile() method.
- Access a specific section using the Document.Sections.get_Item() method.
- Extract text from a specific paragraph with the Section.Paragraphs.get_Item().Text property.
- To retrieve all text within a section, iterate through the section's paragraphs and concatenate their text into a single string.
- Process the extracted text, such as saving it to a file or performing further analysis.
- JavaScript
import React, { useState, useEffect } from 'react';
function App() {
const [wasmModule, setWasmModule] = useState(null);
// Load Spire.Doc
useEffect(() => {
(async () => {
try {
const publicUrl = process.env.PUBLIC_URL || '';
const spireModule = await import(/* webpackIgnore: true */ `${publicUrl}/spire.doc.js`);
const rawModule = spireModule.default || spireModule;
window.wasmModule = typeof rawModule === 'function'
? await rawModule({ locateFile: p => p.endsWith('.wasm') ? `${publicUrl}/${p}` : p })
: rawModule;
setWasmModule(window.wasmModule);
} catch (error) {
console.error('Failed to load spire.doc.js WASM module:', error);
}
})();
}, []);
// Function to extract text from a specific part of a Word document
const ExtractTextFromWordPart = async () => {
const wasmModule = window.wasmModule.spiredoc;
if (wasmModule) {
// Load the font files into the virtual file system (VFS)
await window.spire.FetchFileToVFS('Arial.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/static/font/`);
// Specify the input file name and the output file name
const inputFileName = 'Sample.docx';
const outputFileName = 'ExtractWordText.txt';
// Fetch the input file and add it to the VFS
await window.spire.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/static/data/`);
// Create an instance of the Document class
const doc = new wasmModule.Document();
// Load the Word document
doc.LoadFromFile({ fileName: inputFileName });
// Get a specific section from the document
const section = doc.Sections.get_Item(1);
// Get the text of a specific paragraph in the section
//const paragraphText = section.Paragraphs.get_Item(1).Text;
// Extract all text from the section
let sectionText = "";
for (let i = 0; i < section.Paragraphs.Count; i++) {
// Extract the text from the paragraphs
const text = section.Paragraphs.get_Item(i).Text;
sectionText += text + "\n";
}
// Release resources
doc.Dispose();
// Generate a Blob from the extracted text and trigger a download
const blob = new Blob([sectionText], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = outputFileName;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
};
return (
<div style={{ textAlign: 'center', height: '300px' }}>
<h1>Extract Text from a Specific Part of a Word Document Using JavaScript in React</h1>
<button onClick={ExtractTextFromWordPart} disabled={!wasmModule}>
Convert and Download
</button>
</div>
);
}
export default App;

Extract Text from a Word Document Based on Paragraph Styles
When extracting text formatted with specific paragraph styles, the Paragraph.StyleName property can be utilized to identify and filter paragraphs by their styles. This approach is beneficial for structured documents with distinct headings or other styled elements. The implementation process is as follows:
- Load the spire.doc.js file to initialize the WebAssembly module.
- Load the Word file into the virtual file system using the window.spire.FetchFileToVFS() method.
- Create a Document instance in the WebAssembly module with the new wasmModule.Document() method.
- Load the Word document into the Document instance using the Document.LoadFromFile() method.
- Define the target style name or retrieve one from the document.
- Iterate through the document's sections and their paragraphs:
- Use the Paragraph.StyleName property to identify each paragraph's style.
- Compare the paragraph's style name with the target style. If they match, retrieve the paragraph's text using the Paragraph.Text property.
- Process the retrieved text, such as saving it to a file or using it for further operations.
- JavaScript
import React, { useState, useEffect } from 'react';
function App() {
const [wasmModule, setWasmModule] = useState(null);
// Load Spire.Doc
useEffect(() => {
(async () => {
try {
const publicUrl = process.env.PUBLIC_URL || '';
const spireModule = await import(/* webpackIgnore: true */ `${publicUrl}/spire.doc.js`);
const rawModule = spireModule.default || spireModule;
window.wasmModule = typeof rawModule === 'function'
? await rawModule({ locateFile: p => p.endsWith('.wasm') ? `${publicUrl}/${p}` : p })
: rawModule;
setWasmModule(window.wasmModule);
} catch (error) {
console.error('Failed to load spire.doc.js WASM module:', error);
}
})();
}, []);
// Function to extract text from a Word document based on paragraph styles
const ExtractTextByParagraphStyle = async () => {
const wasmModule = window.wasmModule.spiredoc;
if (wasmModule) {
// Load the font files into the virtual file system (VFS)
await window.spire.FetchFileToVFS('Arial.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/static/font/`);
// Specify the input file name and the output file name
const inputFileName = 'Sample.docx';
const outputFileName = 'ExtractWordText.txt';
// Fetch the input file and add it to the VFS
await window.spire.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/static/data/`);
// Create an instance of the Document class
const doc = new wasmModule.Document();
// Load the Word document
doc.LoadFromFile({ fileName: inputFileName });
// Define the style name or get the style name of the target paragraph style
const styleName = 'Heading2';
// const styleName = doc.Sections.get_Item(2).Paragraphs.get_Item(2).StyleName;
// Array to store extracted text
let paragraphStyleText = [];
// Iterate through the sections in the document
for (let sectionIndex = 0; sectionIndex < doc.Sections.Count; sectionIndex++) {
// Get the current section
const section = doc.Sections.get_Item(sectionIndex);
// Iterate through the paragraphs in the section
for (let paragraphIndex = 0; paragraphIndex < section.Paragraphs.Count; paragraphIndex++) {
// Get the current paragraph
const paragraph = section.Paragraphs.get_Item(paragraphIndex);
// Get the style name of the paragraph
const paraStyleName = paragraph.StyleName;
// Check if the style name matches the target style
if (paraStyleName === styleName) {
// Extract the text from the paragraph
const paragraphText = paragraph.Text;
console.log(paragraphText);
// Append the extracted text to the array
paragraphStyleText.push(paragraphText);
}
}
}
// Release resources
doc.Dispose();
// Generate a Blob from the extracted text and trigger a download
const text = paragraphStyleText.join('\n');
const blob = new Blob([text], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = outputFileName;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
};
return (
<div style={{ textAlign: 'center', height: '300px' }}>
<h1>Extract Text from Word Documents by Paragraph Style Using JavaScript in React</h1>
<button onClick={ExtractTextByParagraphStyle} disabled={!wasmModule}>
Convert and Download
</button>
</div>
);
}
export default App;

Get a Free License
To fully experience the capabilities of Spire.Doc for JavaScript without any evaluation limitations, you can request a free 30-day trial license.
