Java: Read Content from a Word Document
Extracting content from Word documents plays a crucial role in both work and study. Extracting one page of content helps in quickly browsing and summarizing key points, while extracting content from one section aids in in-depth study of specific topics or sections. Extracting the entire document allows you to have a comprehensive understanding of the document content, facilitating deep analysis and comprehensive comprehension. This article will introduce how to use Spire.Doc for Java to read a page, a section, and the entire content of a Word document in a Java project.
- Read a Page from a Word Document in Java
- Read a Section from a Word Document in Java
- Read the Entire Content from a Word Document in Java
Install Spire.Doc for Java
First, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Read a Page from a Word Document in Java
Using the FixedLayoutDocument class and FixedLayoutPage class makes it easy to extract content from a specified page. To facilitate viewing the extracted content, the following example code saves the extracted content to a new Word document. The detailed steps are as follows:
- Create a Document object.
- Load a Word document using the Document.loadFromFile() method.
- Create a FixedLayoutDocument object.
- Obtain a FixedLayoutPage object for a page in the document.
- Use the FixedLayoutPage.getSection() method to get the section where the page is located.
- Get the index position of the first paragraph on the page within the section.
- Get the index position of the last paragraph on the page within the section.
- Create another Document object.
- Add a new section using Document.addSection().
- Clone the properties of the original section to the new section using Section.cloneSectionPropertiesTo(newSection) method.
- Copy the content of the page from the original document to the new document.
- Save the resulting document using the Document.saveToFile() method.
- Java
import com.spire.doc.*; import com.spire.doc.pages.*; import com.spire.doc.documents.*; public class ReadOnePage { public static void main(String[] args) { // Create a new document object Document document = new Document(); // Load document content from the specified file document.loadFromFile("Sample.docx"); // Create a fixed layout document object FixedLayoutDocument layoutDoc = new FixedLayoutDocument(document); // Get the first page FixedLayoutPage page = layoutDoc.getPages().get(0); // Get the section where the page is located Section section = page.getSection(); // Get the first paragraph of the page Paragraph paragraphStart = page.getColumns().get(0).getLines().getFirst().getParagraph(); int startIndex = 0; if (paragraphStart != null) { // Get the index of the paragraph in the section startIndex = section.getBody().getChildObjects().indexOf(paragraphStart); } // Get the last paragraph of the page Paragraph paragraphEnd = page.getColumns().get(0).getLines().getLast().getParagraph(); int endIndex = 0; if (paragraphEnd != null) { // Get the index of the paragraph in the section endIndex = section.getBody().getChildObjects().indexOf(paragraphEnd); } // Create a new document object Document newdoc = new Document(); // Add a new section Section newSection = newdoc.addSection(); // Clone the properties of the original section to the new section section.cloneSectionPropertiesTo(newSection); // Copy the content of the original document's page to the new document for (int i = startIndex; i <=endIndex; i++) { newSection.getBody().getChildObjects().add(section.getBody().getChildObjects().get(i).deepClone()); } // Save the new document to the specified file newdoc.saveToFile("Content of One Page.docx", FileFormat.Docx); // Close and release the new document newdoc.close(); newdoc.dispose(); // Close and release the original document document.close(); document.dispose(); } }
Read a Section from a Word Document in Java
Using Document.Sections[index], you can access specific Section objects that contain the header, footer, and body content of a document. The following example demonstrates a simple method to copy all content from one section to another document. The detailed steps are as follows:
- Create a Document object.
- Load a Word document using the Document.loadFromFile() method.
- Use Document.getSections().get(1) to retrieve the second section of the document.
- Create another new Document object.
- Clone the default style of the original document to the new document using Document.cloneDefaultStyleTo(newdoc) method.
- Use Document.getSections().add(section.deepClone()) to clone the content of the second section of the original document to the new document.
- Save the resulting document using the Document.saveToFile() method.
- Java
import com.spire.doc.*; public class ReadOneSection { public static void main(String[] args) { // Create a new document object Document document = new Document(); // Load a Word document from a file document.loadFromFile("Sample.docx"); // Get the second section of the document Section section = document.getSections().get(1); // Create a new document object Document newdoc = new Document(); // Clone the default style to the new document document.cloneDefaultStyleTo(newdoc); // Clone the second section to the new document newdoc.getSections().add(section.deepClone()); // Save the new document to a file newdoc.saveToFile("Content of One Section.docx", FileFormat.Docx); // Close and release the new document object newdoc.close(); newdoc.dispose(); // Close and release the original document object document.close(); document.dispose(); } }
Read the Entire Content from a Word Document in Java
This example demonstrates how to iterate through each section of the original document to read the entire content of the document and clone each section into a new document. This method can help you quickly replicate both the structure and content of the entire document, preserving the format and layout of the original document in the new document. Such operations are very useful for maintaining the integrity and consistency of the document structure. The detailed steps are as follows:
- Create a Document object.
- Load a Word document using the Document.loadFromFile() method.
- Create another new Document object.
- Clone the default style of the original document to the new document using the Document.cloneDefaultStyleTo(newdoc) method.
- Iterate through each section of the original document using a for loop and clone it into the new document.
- Save the resulting document using the Document.saveToFile() method.
- Java
import com.spire.doc.*; public class ReadOneDocument { public static void main(String[] args) { // Create a new document object Document document = new Document(); // Load a Word document from a file document.loadFromFile("Sample.docx"); // Create a new document object Document newdoc = new Document(); // Clone the default style to the new document document.cloneDefaultStyleTo(newdoc); // Iterate through each section in the original document and clone it to the new document for (Section sourceSection : (Iterable) document.getSections()) { newdoc.getSections().add(sourceSection.deepClone()); } // Save the new document to a file newdoc.saveToFile("Content of the entire document.docx", FileFormat.Docx); // Close and release the new document object newdoc.close(); newdoc.dispose(); // Close and release the original document object document.close(); document.dispose(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Java: Add, Insert, or Delete Pgaes in Word Documents
Adding, inserting, and deleting pages in a Word document are crucial steps in managing and presenting content. By adding or inserting new pages, you can expand the document to accommodate more content, making it more organized and readable. Deleting pages helps simplify the document by removing unnecessary or erroneous information. These operations can enhance the overall quality and clarity of the document. This article will demonstrate how to use Spire.Doc for Java to add, insert, and delete pages in a Word document within a Java project.
- Add a Page in a Word Document in Java
- Insert a Page in a Word Document in Java
- Delete a Page from a Word Document in Java
Install Spire.Doc for Java
First, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Add a Page in a Word Document in Java
The steps to add a new page at the end of a Word document include locating the last section, and then inserting a page break at the end of that section's last paragraph. This way ensures that any content added subsequently will start displaying on a new page, maintaining the clarity and coherence of the document structure. The detailed steps are as follows:
- Create a Document object.
- Load a Word document using the Document.loadFromFile() method.
- Get the body of the last section of the document using Document.getLastSection().getBody().
- Add a page break by calling Paragraph.appendBreak(BreakType.Page_Break) method.
- Create a new paragraph style ParagraphStyle object.
- Add the new paragraph style to the document's style collection using Document.getStyles().add(paragraphStyle) method.
- Create a new paragraph Paragraph object and set the text content.
- Apply the previously created paragraph style to the new paragraph using Paragraph.applyStyle(paragraphStyle.getName()) method.
- Add the new paragraph to the document using Body.getChildObjects().add(paragraph) method.
- Save the resulting document using the Document.saveToFile() method.
- Java
import com.spire.doc.*; import com.spire.doc.documents.*; public class AddOnePage { public static void main(String[] args) { // Create a new document object Document document = new Document(); // Load a sample document from a file document.loadFromFile("Sample.docx"); // Get the body of the last section of the document Body body = document.getLastSection().getBody(); // Insert a page break after the last paragraph in the body body.getLastParagraph().appendBreak(BreakType.Page_Break); // Create a new paragraph style ParagraphStyle paragraphStyle = new ParagraphStyle(document); paragraphStyle.setName("CustomParagraphStyle1"); paragraphStyle.getParagraphFormat().setLineSpacing(12); paragraphStyle.getParagraphFormat().setAfterSpacing(8); paragraphStyle.getCharacterFormat().setFontName("Microsoft YaHei"); paragraphStyle.getCharacterFormat().setFontSize(12); // Add the paragraph style to the document's style collection document.getStyles().add(paragraphStyle); // Create a new paragraph and set the text content Paragraph paragraph = new Paragraph(document); paragraph.appendText("Thank you for using our Spire.Doc for Java product. The trial version will add a red watermark to the generated result document and only supports converting the first 10 pages to other formats. Upon purchasing and applying a license, these watermarks will be removed, and the functionality restrictions will be lifted."); // Apply the paragraph style paragraph.applyStyle(paragraphStyle.getName()); // Add the paragraph to the body's content collection body.getChildObjects().add(paragraph); // Create another new paragraph and set the text content paragraph = new Paragraph(document); paragraph.appendText("To fully experience our product, we provide a one-month temporary license for each of our customers for free. Please send an email to sales@e-iceblue.com, and we will send the license to you within one working day."); // Apply the paragraph style paragraph.applyStyle(paragraphStyle.getName()); // Add the paragraph to the body's content collection body.getChildObjects().add(paragraph); // Save the document to a specified path document.saveToFile("Add a Page.docx", FileFormat.Docx); // Close the document document.close(); // Dispose of the document object's resources document.dispose(); } }
Insert a Page in a Word Document in Java
Before inserting a new page, it is necessary to determine the ending position index of the specified page content within the section, and then add the content of the new page to the document one by one. To ensure that the content is separated from the subsequent pages, page breaks need to be inserted at appropriate positions. The detailed steps are as follows:
- Create a Document object.
- Load a Word document using the Document.loadFromFile() method.
- Create a FixedLayoutDocument object.
- Obtain the FixedLayoutPage object of a page in the document.
- Get the index position of the last paragraph on the page within the section.
- Create a new paragraph style ParagraphStyle object.
- Add the new paragraph style to the document using the Document.getStyles().add(paragraphStyle) method.
- Create a new paragraph Paragraph object and set the text content.
- Apply the previously created paragraph style to the new paragraph using the Paragraph.applyStyle(paragraphStyle.getName()) method.
- Insert the new paragraph at the specified position using the Body.getChildObjects().insert(index, Paragraph) method.
- Create another new paragraph object, set its text content, add a page break by calling the Paragraph.appendBreak(BreakType.Page_Break) method, apply the previously created paragraph style, and finally insert this paragraph into the document.
- Save the resulting document using the Document.saveToFile() method.
- Java
import com.spire.doc.*; import com.spire.doc.pages.*; import com.spire.doc.documents.*; public class InsertOnePage { public static void main(String[] args) { // Create a new document object Document document = new Document(); // Load a sample document from a file document.loadFromFile("Sample.docx"); // Create a fixed layout document object FixedLayoutDocument layoutDoc = new FixedLayoutDocument(document); // Get the first page FixedLayoutPage page = layoutDoc.getPages().get(0); // Get the body of the document Body body = page.getSection().getBody(); // Get the paragraph at the end of the current page Paragraph paragraphEnd = page.getColumns().get(0).getLines().getLast().getParagraph(); // Initialize the end index int endIndex = 0; if (paragraphEnd != null) { // Get the index of the last paragraph endIndex = body.getChildObjects().indexOf(paragraphEnd); } // Create a new paragraph style ParagraphStyle paragraphStyle = new ParagraphStyle(document); paragraphStyle.setName("CustomParagraphStyle1"); paragraphStyle.getParagraphFormat().setLineSpacing(12); paragraphStyle.getParagraphFormat().setAfterSpacing(8); paragraphStyle.getCharacterFormat().setFontName("Microsoft YaHei"); paragraphStyle.getCharacterFormat().setFontSize(12); // Add the style to the document document.getStyles().add(paragraphStyle); // Create a new paragraph and set the text content Paragraph paragraph = new Paragraph(document); paragraph.appendText("Thank you for using our Spire.Doc for Java product. The trial version will add a red watermark to the generated result document and only supports converting the first 10 pages to other formats. Upon purchasing and applying a license, these watermarks will be removed, and the functionality restrictions will be lifted."); // Apply the paragraph style paragraph.applyStyle(paragraphStyle.getName()); // Insert the paragraph at the specified position body.getChildObjects().insert(endIndex + 1, paragraph); // Create another new paragraph and set the text content paragraph = new Paragraph(document); paragraph.appendText("To fully experience our product, we provide a one-month temporary license for each of our customers for free. Please send an email to sales@e-iceblue.com, and we will send the license to you within one working day."); // Apply the paragraph style paragraph.applyStyle(paragraphStyle.getName()); // Add a page break paragraph.appendBreak(BreakType.Page_Break); // Insert the paragraph at the specified position body.getChildObjects().insert(endIndex + 2, paragraph); // Save the document to a specified path document.saveToFile("Insert a New Page after a Specified Page.docx", FileFormat.Docx); // Close and dispose of the document object's resources document.close(); document.dispose(); } }
Delete a Page from a Word Document in Java
To delete the content of a page, you first need to find the position index of the starting and ending elements of that page in the document. Then, by looping through, you can remove these elements one by one to delete the entire content of the page. The detailed steps are as follows:
- Create a Document object.
- Load a Word document using the Document.loadFromFile() method.
- Create a FixedLayoutDocument object.
- Obtain the FixedLayoutPage object of the first page in the document.
- Use the FixedLayoutPage.getSection() method to get the section where the page is located.
- Get the index position of the first paragraph on the page within the section.
- Get the index position of the last paragraph on the page within the section.
- Use a for loop to remove the content of the page one by one.
- Save the resulting document using the Document.saveToFile() method.
- Java
import com.spire.doc.*; import com.spire.doc.pages.*; import com.spire.doc.documents.*; public class RemoveOnePage { public static void main(String[] args) { // Create a new document object Document document = new Document(); // Load a sample document from a file document.loadFromFile("Sample.docx"); // Create a fixed layout document object FixedLayoutDocument layoutDoc = new FixedLayoutDocument(document); // Get the second page FixedLayoutPage page = layoutDoc.getPages().get(1); // Get the section of the page Section section = page.getSection(); // Get the first paragraph on the first page Paragraph paragraphStart = page.getColumns().get(0).getLines().getFirst().getParagraph(); int startIndex = 0; if (paragraphStart != null) { // Get the index of the starting paragraph startIndex = section.getBody().getChildObjects().indexOf(paragraphStart); } // Get the last paragraph on the last page Paragraph paragraphEnd = page.getColumns().get(0).getLines().getLast().getParagraph(); int endIndex = 0; if (paragraphEnd != null) { // Get the index of the ending paragraph endIndex = section.getBody().getChildObjects().indexOf(paragraphEnd); } // Remove paragraphs within the specified range for (int i = 0; i <= (endIndex - startIndex); i++) { section.getBody().getChildObjects().removeAt(startIndex); } // Save the document to a specified path document.saveToFile("Delete a Page.docx", FileFormat.Docx); // Close and dispose of the document object's resources document.close(); document.dispose(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Java: Add and Change Variables in Word Documents
Variables in Word documents are a type of field that is characterized by the ability of convenient and accurate text management, such as text replacement and deletion. Compared with the find-and-replace function, replacing text by assigning values to variables is faster and less error-prone. This article is going to show how to add or change variables in Word documents programmatically using Spire.Doc for Java.
Install Spire.Doc for Java
First, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Insert Variables into Word Documents
As variables are a kind of Word fields, we can use the Paragraph.appendField(String fieldName, FieldType.Field_Doc_Variable) method to insert variables into Word documents, and then use the VariableCollection.add() method to assign values to the variables. It should be noted that after assigning values to variables, document fields need to be updated to display the assigned values. The detailed steps are as follows.
- Create an object of Document.
- Add a section to the document using Document.addSection() method.
- Add a paragraph to the section using Section.addParagraph() method.
- Add variable fields to the paragraph using Paragraph.appendField(String fieldName, FieldType.Field_Doc_Variable) method.
- Get the variable collection using Document.getVariables() method.
- Assign a value to the variable using VariableCollection.add() method.
- Update the fields in the document using Document.isUpdateFields() method.
- Save the document using Document.saveToFile() method.
- Java
import com.spire.doc.*; import com.spire.doc.documents.Paragraph; import com.spire.doc.formatting.CharacterFormat; public class AddVariables { public static void main(String[] args) { //Create an object of Document Document document = new Document(); //Add a section Section section = document.addSection(); //Add a paragraph Paragraph paragraph = section.addParagraph(); //Set text format CharacterFormat characterFormat = paragraph.getStyle().getCharacterFormat(); characterFormat.setFontName("Times New Roman"); characterFormat.setFontSize(14); //Set the page margin section.getPageSetup().getMargins().setTop(80f); //Add variable fields to the paragraph paragraph.appendField("Term", FieldType.Field_Doc_Variable); paragraph.appendText(" is an object.\r\n"); paragraph.appendField("Term", FieldType.Field_Doc_Variable); paragraph.appendText(" is not a backdrop, an illusion, or an emergent phenomenon.\r\n"); paragraph.appendField("Term", FieldType.Field_Doc_Variable); paragraph.appendText(" has a physical size that be measured in laboratories."); //Get the variable collection VariableCollection variableCollection = document.getVariables(); //Assign a value to the variable variableCollection.add("Term", "Time"); //Update the fields in the document document.isUpdateFields(true); //Save the document document.saveToFile("AddVariables.docx", FileFormat.Auto); document.dispose(); } }
Change the Value of Variables in Word Documents
Spire.Doc for Java provides the VariableCollection.set() method to change the values of variables. And after updating fields in the document, all the occurrences of the variables will display the newly assigned value, thus achieving fast and accurate text replacement. The detailed steps are as follows.
- Create an object of Document.
- Load a Word document using Document.loaFromFile() method.
- Get the variable collection using Document.getVariables() method.
- Assign a new value to a specific variable through its name using VariableCollection.set() method.
- Update the fields in the document using Document.isUpdateFields() method.
- Save the document using Document.saveToFile() method.
- Java
import com.spire.doc.Document; import com.spire.doc.FileFormat; import com.spire.doc.VariableCollection; public class ChangeVariableValue { public static void main(String[] args) { //Create an object of Document Document document = new Document(); //Load a Word document document.loadFromFile("AddVariables.docx"); //Get the variable collection VariableCollection variableCollection = document.getVariables(); //Assign a new value to a variable variableCollection.set("Term", "The time"); //Update the fields in the document document.isUpdateFields(true); //Save the document document.saveToFile("ChangeVariable.docx", FileFormat.Auto); document.dispose(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Java: Split Word Documents
When you have a fairly long Word document that requires teamwork, it may be necessary to split the document into several shorter files and assign them to different people to speed up the workflow. Instead of manually cutting and pasting, this article will demonstrate how to programmatically split a Word document using Spire.Doc for Java .
Install Spire.Doc for Java
First, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Split a Word Document by Page Break
A Word document can contain multiple pages separated by page breaks. To split a Word document by page break, you can refer to the below steps and code.
- Create a Document instance.
- Load a sample Word document using Document.loadFromFile() method.
- Create a new Word document and add a section to it.
- Traverse through all body child objects of each section in the original document and determine whether the child object is a paragraph or a table.
- If the child object of the section is a table, directly add it to the section of new document using Section.getBody().getChildObjects().add() method.
- If the child object of the section is a paragraph, first add the paragraph object to the section of the new document. Then traverse through all child objects of the paragraph and determine whether the child object is a page break.
- If the child object is a page break, get its index and then remove the page break from its paragraph by index.
- Save the new Word document and then repeat the above processes.
- Java
import com.spire.doc.*; import com.spire.doc.documents.*; public class splitDocByPageBreak { public static void main(String[] args) throws Exception { // Create a Document instance Document original = new Document(); // Load a sample Word document original.loadFromFile("E:\\Files\\SplitByPageBreak.docx"); // Create a new Word document and add a section to it Document newWord = new Document(); Section section = newWord.addSection(); int index = 0; //Traverse through all sections of original document for (int s = 0; s < original.getSections().getCount(); s++) { Section sec = original.getSections().get(s); //Traverse through all body child objects of each section. for (int c = 0; c < sec.getBody().getChildObjects().getCount(); c++) { DocumentObject obj = sec.getBody().getChildObjects().get(c); if (obj instanceof Paragraph) { Paragraph para = (Paragraph) obj; sec.cloneSectionPropertiesTo(section); //Add paragraph object in original section into section of new document section.getBody().getChildObjects().add(para.deepClone()); for (int i = 0; i < para.getChildObjects().getCount(); i++) { DocumentObject parobj = para.getChildObjects().get(i); if (parobj instanceof Break) { Break break1 = (Break) parobj; if (break1.getBreakType().equals(BreakType.Page_Break)) { //Get the index of page break in paragraph int indexId = para.getChildObjects().indexOf(parobj); //Remove the page break from its paragraph Paragraph newPara = (Paragraph) section.getBody().getLastParagraph(); newPara.getChildObjects().removeAt(indexId); //Save the new Word document newWord.saveToFile("output/result"+index+".docx", FileFormat.Docx); index++; //Create a new document and add a section newWord = new Document(); section = newWord.addSection(); //Add paragraph object in original section into section of new document section.getBody().getChildObjects().add(para.deepClone()); if (section.getParagraphs().get(0).getChildObjects().getCount() == 0) { //Remove the first blank paragraph section.getBody().getChildObjects().removeAt(0); } else { //Remove the child objects before the page break while (indexId >= 0) { section.getParagraphs().get(0).getChildObjects().removeAt(indexId); indexId--; } } } } } } if (obj instanceof Table) { //Add table object in original section into section of new document section.getBody().getChildObjects().add(obj.deepClone()); } } } //Save to file newWord.saveToFile("output/result"+index+".docx", FileFormat.Docx); } }
Split a Word Document by Section Break
In Word, a section is a part of a document that contains its own page formatting. For documents that contain multiple sections, Spire.Doc for .NET also supports splitting documents by section breaks. The detailed steps are as follows.
- Create a Document instance.
- Load a sample Word document using Document.LoadFromFile() method.
- Define a new Word document object.
- Traverse through all sections of the original Word document.
- Clone each section of the original document using Section.deepClone() method.
- Add the cloned section to the new document as a new section using Document.getSections().add() method.
- Save the result document using Document.saveToFile() method.
- Java
import com.spire.doc.*; public class splitDocBySectionBreak { public static void main(String[] args) throws Exception { //Create Document instance Document document = new Document(); //Load a sample Word document document.loadFromFile("E:\\Files\\SplitBySectionBreak.docx"); //Define a new Word document object Document newWord; //Traverse through all sections of the original Word document for (int i = 0; i < document.getSections().getCount(); i++){ newWord = new Document(); //Clone each section of the original document and add it to the new document as new section newWord.getSections().add(document.getSections().get(i).deepClone()); //Save the result document newWord.saveToFile("Result/result"+i+".docx"); } } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Java: Get All Revisions from Word
After you enable the Track Changes feature in a Word document, it records all the edits in the document, such as insertions, deletions, replacements, and format changes. Track Changes is a great feature allowing you to see what changes have been made to a document. This tutorial shows how to get all revisions from a Word document by using Spire.Doc for Java.
Install Spire.Doc for Java
First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Get All Revisions from Word
The detailed steps are as follows.
- Create a Document instance and load a sample Word document using Document.loadFromFile() method.
- Create a StringBuilder object and then using StringBuilder.append() method to log data.
- Traverse all the sections and every element under body in the section.
- Determine if the paragraph is an insertion revision or not using Paragraph.isInsertRevision() method. If yes, use Paragraph.getInsertRevision() method to get the insertion revision. Then get the revision type and author using EditRevision.getType() method and EditRevision.getAuthor() method.
- Determine if the paragraph is a delete revision or not using Paragraph.inDeleteRevision() method. If yes, use Paragraph.getDeleteRevision() method to get the delete revision. Then get the revision type and author using EditRevision.getType() method and EditRevision.getAuthor() method.
- Traverse all the elements in the paragraphs to get the text ranges' revisions.
- Write the content of StringBuilder to a txt document using FileWriter.write() method.
- Java
import com.spire.doc.*; import com.spire.doc.documents.*; import com.spire.doc.fields.*; import com.spire.doc.formatting.revisions.*; import java.io.FileWriter; public class getRevisions { public static void main(String[] args) throws Exception { //Load the sample Word document Document document = new Document(); document.loadFromFile("test file.docx"); //Create a StringBuilder object to get the insertions StringBuilder insertRevision = new StringBuilder(); insertRevision.append("Insert revisions:"+"\n"); int index_insertRevision = 0; //Create a StringBuilder object to get the deletions StringBuilder deleteRevision = new StringBuilder(); deleteRevision.append("Delete revisions:"+"\n"); int index_deleteRevision = 0; //Traverse all the sections for (Section sec : (Iterable<Section>) document.getSections()) { //Iterate through the element under body in the section for(DocumentObject docItem : (Iterable<DocumentObject>)sec.getBody().getChildObjects()) { if (docItem instanceof Paragraph) { Paragraph para = (Paragraph)docItem; //Determine if the paragraph is an insertion revision if (para.isInsertRevision()) { index_insertRevision++; insertRevision.append("Index: " + index_insertRevision+"\n"); //Get insertion revision EditRevision insRevison = para.getInsertRevision(); //Get insertion revision type EditRevisionType insType = insRevison.getType(); insertRevision.append("Type: " + insType+"\n"); //Get insertion revision author String insAuthor = insRevison.getAuthor(); insertRevision.append("Author: " + insAuthor + "\n"); } //Determine if the paragraph is a delete revision else if (para.isDeleteRevision()) { index_deleteRevision++; deleteRevision.append("Index: " + index_deleteRevision +"\n"); EditRevision delRevison = para.getDeleteRevision(); EditRevisionType delType = delRevison.getType(); deleteRevision.append("Type: " + delType+ "\n"); String delAuthor = delRevison.getAuthor(); deleteRevision.append("Author: " + delAuthor + "\n"); } //Iterate through the element in the paragraph for(DocumentObject obj : (Iterable<DocumentObject>)para.getChildObjects()) { if (obj instanceof TextRange) { TextRange textRange = (TextRange)obj; //Determine if the textrange is an insertion revision if (textRange.isInsertRevision()) { index_insertRevision++; insertRevision.append("Index: " + index_insertRevision +"\n"); EditRevision insRevison = textRange.getInsertRevision(); EditRevisionType insType = insRevison.getType(); insertRevision.append("Type: " + insType + "\n"); String insAuthor = insRevison.getAuthor(); insertRevision.append("Author: " + insAuthor + "\n"); } else if (textRange.isDeleteRevision()) { index_deleteRevision++; deleteRevision.append("Index: " + index_deleteRevision +"\n"); //Determine if the textrange is a delete revision EditRevision delRevison = textRange.getDeleteRevision(); EditRevisionType delType = delRevison.getType(); deleteRevision.append("Type: " + delType+"\n"); String delAuthor = delRevison.getAuthor(); deleteRevision.append("Author: " + delAuthor+"\n"); } } } } } } //Save to a .txt file FileWriter writer1 = new FileWriter("insertRevisions.txt"); writer1.write(insertRevision.toString()); writer1.flush(); writer1.close(); //Save to a .txt file FileWriter writer2 = new FileWriter("deleteRevisions.txt"); writer2.write(deleteRevision.toString()); writer2.flush(); writer2.close(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Java: Count the Number of Words in a Word Document
Microsoft Word provides a real-time word counter that counts the number of words in a document when you type. Beyond that, Microsoft Word also counts the number of pages, paragraphs and characters with or without spaces. In this article, you will learn how to programmatically count the number of words or characters in an existing Word document using Spire.Doc for Java.
Install Spire.Doc for Java
First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Count the Number of Words in a Word Document
The detailed steps are as follows:
- Create a Document instance.
- Load a sample Word document using Document.loadFromFile() method.
- Count the number of words using Document.getBuiltinDocumentProperties().getWordCount() method.
- Count the number of characters without spaces using Document.getBuiltinDocumentProperties().getCharCount() method.
- Count the number of characters with spaces using Document.getBuiltinDocumentProperties().getCharCountWithSpace() method.
- Java
import com.spire.doc.*; public class countWordsNumber { public static void main(String[] args) { //Create a Document instance Document document = new Document(); //Load a sample Word document document.loadFromFile("Demo.docx"); //Count the number of words System.out.println("WordCount: " + document.getBuiltinDocumentProperties().getWordCount()); //Count the number of characters without spaces System.out.println("CharCount: " + document.getBuiltinDocumentProperties().getCharCount()); //Count the number of characters with spaces System.out.println("CharCountWithSpace: " + document.getBuiltinDocumentProperties().getCharCountWithSpace()); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Java: Insert Math Equations in Word
Math equations are mathematical expressions commonly used in physics, engineering, computer science, and economics fields. When creating a professional Word document, you may sometimes need to include math equations to explain complex concepts, solve problems, or support specific arguments. In this article, you will learn how to insert math equations into Word documents in Java using Spire.Doc for Java.
Install Spire.Doc for Java
First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Insert Math Equations into a Word Document in Java
Spire.Doc for Java allows generating math equations from LaTeX code and MathML code using OfficeMath.fromLatexMathCode(String latexMathCode) and OfficeMath.fromMathMLCode(String mathMLCode) methods. The following are the detailed steps.
- Create two string arrays from LaTeX code and MathML code.
- Create a Document instance and add a section to it using Document.addSection() method.
- Iterate through each LaTeX code in the string array.
- Create a math equation from the LaTeX code using OfficeMath.fromLatexMathCode() method.
- Add a paragraph to the section, then add the math equation to the paragraph using Paragraph.getItems().add() method.
- Iterate through each MathML code in the string array.
- Create a math equation from the MathML code using OfficeMath.fromMathMLCode() method.
- Add a paragraph to the section, then add the math equation to the paragraph using Paragraph.getItems().add() method.
- Save the result document using Document.saveToFile() method.
- Java
import com.spire.doc.*; import com.spire.doc.documents.*; import com.spire.doc.fields.omath.*; public class AddMathEquations { public static void main(String[] args){ //Create a string array from LaTeX code String[] latexMathCode = { "x^{2}+\\sqrt{x^{2}+1}=2", "\\cos (2\\theta) = \\cos^2 \\theta - \\sin^2 \\theta", "k_{n+1} = n^2 + k_n^2 - k_{n-1}", "\\frac {\\frac {1}{x}+ \\frac {1}{y}}{y-z}", "\\int_0^ \\infty \\mathrm {e}^{-x} \\, \\mathrm {d}x", "\\forall x \\in X, \\quad \\exists y \\leq \\epsilon", "\\alpha, \\beta, \\gamma, \\Gamma, \\pi, \\Pi, \\phi, \\varphi, \\mu, \\Phi", "A_{m,n} = \\begin{pmatrix} a_{1,1} & a_{1,2} & \\cdots & a_{1,n} \\\\ a_{2,1} & a_{2,2} & \\cdots & a_{2,n} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ a_{m,1} & a_{m,2} & \\cdots & a_{m,n} \\end{pmatrix}", }; //Create a string array from MathML code String[] mathMLCode = { "<math xmlns=\"http://www.w3.org/1998/Math/MathML\"><mi>a</mi><mo>≠</mo><mn>0</mn></math>", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\"><mi>a</mi><msup><mi>x</mi><mn>2</mn></msup><mo>+</mo><mi>b</mi><mi>x</mi><mo>+</mo><mi>c</mi><mo>=</mo><mn>0</mn></math>", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\"><mi>x</mi><mo>=</mo><mrow><mfrac><mrow><mo>−</mo><mi>b</mi><mo>±</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>−</mo><mn>4</mn><mi>a</mi><mi>c</mi></msqrt></mrow><mrow><mn>2</mn><mi>a</mi></mrow></mfrac></mrow></math>", }; //Create a Document instance Document doc = new Document(); //Add a section Section section = doc.addSection(); //Add a paragraph to the section Paragraph textPara = section.addParagraph(); textPara.appendText("Creating Equations from LaTeX Code"); textPara.applyStyle(BuiltinStyle.Heading_1); textPara.getFormat().setHorizontalAlignment(HorizontalAlignment.Center); //Iterate through each LaTeX code in the string array for (int i = 0; i < latexMathCode.length; i++) { //Create a math equation from the LaTeX code OfficeMath officeMath = new OfficeMath(doc); officeMath.fromLatexMathCode(latexMathCode[i]); //Add the math equation to the section Paragraph paragraph = section.addParagraph(); paragraph.getItems().add(officeMath); section.addParagraph(); } //Add a paragraph to the section textPara = section.addParagraph(); textPara.appendText("Creating Equations from MathML Code"); textPara.applyStyle(BuiltinStyle.Heading_1); textPara.getFormat().setHorizontalAlignment(HorizontalAlignment.Center); //Iterate through each MathML code in the string array for (int j = 0; j < mathMLCode.length; j++) { //Create a math equation from the MathML code OfficeMath officeMath = new OfficeMath(doc); officeMath.fromMathMLCode(mathMLCode[j]); //Add the math equation to the section Paragraph paragraph = section.addParagraph(); paragraph.getItems().add(officeMath); section.addParagraph(); } //Save the result document doc.saveToFile("AddMathEquations.docx", FileFormat.Docx_2016); doc.dispose(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Java: Compare Two Word Documents
Document comparison is the process of checking new versions of a document against previous copies in order to identify changes made by different contributors. These differences may include additions or omissions of words, sentences or paragraphs, and formatting adjustments. This article demonstrates how to compare two Word documents in Java using Spire.Doc for Java.
- Compare Two Documents and Save Result in a Third Word Document
- Compare Two Documents and Return Insertions and Deletions in Lists
Below is a screenshot of the two Word documents that’ll be compared.
Install Spire.Doc for Java
First, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>12.4.6</version> </dependency> </dependencies>
Compare Two Documents and Save Result in a Third Word Document
Saving the comparison result in a separate Word document allows users to see all the changes made to the original document, including insertions, deletions as well as modifications on formatting. The following are the steps to compare two documents and save the result in a third Word document using Spire.Doc for Java.
- Load two Word documents separately while initialing the Document objects.
- Compare these two documents using Document.compare() method.
- Save the result in a third Word document using Document.saveToFile() method.
- Java
import com.spire.doc.Document; import com.spire.doc.FileFormat; public class CompareDocuments { public static void main(String[] args) { //Load one Word document Document doc1 = new Document("C:\\Users\\Administrator\\Desktop\\original.docx"); //Load the other Word document Document doc2 = new Document("C:\\Users\\Administrator\\Desktop\\revised.docx"); //Compare two documents doc1.compare(doc2, "John"); //Save the differences in a third document doc1.saveToFile("Differences.docx", FileFormat.Docx_2013); doc1.dispose(); } }
Compare Two Documents and Return Insertions and Deletions in Lists
Sometimes, we may only care about the insertions and deletions instead of the whole differences. The following are the steps to get insertions and deletions in two separate lists.
- Load two Word documents separately while initialing the Document objects.
- Compare two documents using Document.compare() method.
- Get the revisions using the constructor function of the DifferRevisions class.
- Get a list of insertions using DifferRevisions.getInsertRevisions() method.
- Get a list of deletions using DifferRevisions.getDeleteRevisions() method.
- Loop through the elements in the two lists to get the specific insertion and deletion.
- Java
import com.spire.doc.DifferRevisions; import com.spire.doc.Document; import com.spire.doc.DocumentObject; import com.spire.doc.fields.TextRange; import com.spire.ms.System.Collections.Generic.List; public class CompareReturnResultsInLists { public static void main(String[] args) { //Load one Word document Document doc1 = new Document("C:\\Users\\Administrator\\Desktop\\original.docx"); //Load the other Word document Document doc2 = new Document("C:\\Users\\Administrator\\Desktop\\revised.docx"); //Compare the two Word documents doc1.compare(doc2, "Author"); //Get the revisions DifferRevisions differRevisions = new DifferRevisions(doc1); //Return the insertion revisions in a list List insertRevisionsList = differRevisions.getInsertRevisions(); //Return the deletion revisions in a list List deleteRevisionsList = differRevisions.getDeleteRevisions(); //Create two int variables int m = 0; int n = 0; //Loop through the insertion revision list for (int i = 0; i < insertRevisionsList.size(); i++) { if (insertRevisionsList.get(i) instanceof TextRange) { m += 1; //Get the specific revision and get its content TextRange textRange = (TextRange)insertRevisionsList.get(i) ; System.out.println("Insertion #" + m + ":" + textRange.getText()); } } System.out.println("============================================"); //Loop through the deletion revision list for (int i = 0; i < deleteRevisionsList.size() ; i++) { if (deleteRevisionsList.get(i) instanceof TextRange) { n += 1; //Get the specific revision and get its content TextRange textRange = (TextRange) deleteRevisionsList.get(i) ; System.out.println("Deletion #" + n + ":" + textRange.getText()); } } } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Add, Count, Retrieve and Remove Variables in Word in Java
This article demonstrates how to add, count, retrieve and remove variables in a Word document in Java using Spire.Doc for Java library.
Add a Variable
The following example adds a document variable named "A1" with a value of 12 to a Word document.
import com.spire.doc.Document; import com.spire.doc.FieldType; import com.spire.doc.FileFormat; import com.spire.doc.Section; import com.spire.doc.documents.Paragraph; public class AddVariables { public static void main(String[] args){ //Create a Document instance Document document = new Document(); //Add a section Section section = document.addSection(); //Add a paragraph to the section Paragraph paragraph = section.addParagraph(); //Add a DocVariable field to the paragraph paragraph.appendField("A1", FieldType.Field_Doc_Variable); //Add a document variable to the DocVariable field document.getVariables().add("A1", "12"); //Update fields in the document document.isUpdateFields(true); //Save the result document document.saveToFile("AddVariables.docx", FileFormat.Docx_2013); } }
Count the number of Variables
import com.spire.doc.Document; public class CountVariables { public static void main(String[] args){ //Load the Word document Document document = new Document(); document.loadFromFile("AddVariables.docx"); //Get the number of variables in the document int number = document.getVariables().getCount(); StringBuilder content = new StringBuilder(); content.append("The number of variables is: " + number); System.out.println(content.toString()); } }
Retrieve Name and Value of a Variable
import com.spire.doc.Document; public class RetrieveVariables { public static void main(String[] args){ //Load the Word document Document document = new Document(); document.loadFromFile("AddVariables.docx"); //Retrieve the name of a variable by index String s1 = document.getVariables().getNameByIndex(0); //Retrieve the value of a variable by index String s2 = document.getVariables().getValueByIndex(0); //Retrieve the value of a variable by name String s3 = document.getVariables().get("A1"); System.out.println("The name of the variable retrieved by index 0 is: " + s1); System.out.println("The value of the variable retrieved by index 0 is: " + s2); System.out.println("The value of the variable retrieved by name \"A1\" is: " + s3); } }
Remove a specific Variable
import com.spire.doc.Document; import com.spire.doc.FileFormat; public class RemoveVariables { public static void main(String[] args){ //Load the Word document Document document = new Document(); document.loadFromFile("AddVariables.docx"); //Remove a variable by name document.getVariables().remove("A1"); //Update fields in the document document.isUpdateFields (true); //Save the result document document.saveToFile("RemoveVariables.docx", FileFormat.Docx_2013); } }
Insert Math Equations and Symbols in Word Document in Java
This article demonstrates how to insert math equations i.e. Latex and MathML equations and Symbols in a Word document in Java using Spire.Doc for Java.
import com.spire.doc.Document; import com.spire.doc.FileFormat; import com.spire.doc.Section; import com.spire.doc.Table; import com.spire.doc.documents.Paragraph; import com.spire.doc.fields.omath.OfficeMath; public class AddMathEquationsAndSymbols { public static void main(String[] args){ //Latex code String[] latexMathCode1 = { "x^{2}+\\sqrt{{x^{2}+1}}+1", "2\\alpha - \\sin y + x", }; //MathML code String[] mathMLCode = { "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:m=\"http://schemas.openxmlformats.org/officeDocument/2006/math\"><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msqrt><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:msqrt><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math>", "<mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:m=\"http://schemas.openxmlformats.org/officeDocument/2006/math\"><mml:mn>2</mml:mn><mml:mi>α</mml:mi><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:mi>x</mml:mi></mml:math>", }; //Latex code String[] latexMathCode2 = { "\\alpha", "\\beta", }; //Create a Document instance Document doc = new Document(); //Load the Word document doc.loadFromFile("MathEquationTemplate.docx"); //Get the first section Section section = doc.getSections().get(0); Paragraph paragraph = null; OfficeMath officeMath; //Insert Latex equations Table table1 = section.getTables().get(0); for (int i = 0; i < latexMathCode1.length; i++) { paragraph = table1.getRows().get(i + 1).getCells().get(0).addParagraph(); paragraph.setText(latexMathCode1[i]); paragraph = table1.getRows().get(i + 1).getCells().get(1).addParagraph(); officeMath = new OfficeMath(doc); officeMath.fromLatexMathCode(latexMathCode1[i]); paragraph.getItems().add(officeMath); } //Insert MathML equations Table table2 = section.getTables().get(1); for (int i = 0; i < mathMLCode.length; i++) { paragraph = table2.getRows().get(i + 1).getCells().get(0).addParagraph(); paragraph.setText(mathMLCode[i]); paragraph = table2.getRows().get(i + 1).getCells().get(1).addParagraph(); officeMath = new OfficeMath(doc); officeMath.fromMathMLCode(mathMLCode[i]); paragraph.getItems().add(officeMath); } //Insert Symbols Table table3 = section.getTables().get(2); for (int i = 0; i < latexMathCode2.length; i++) { //Insert Latex code to the first column of the table paragraph = table3.getRows().get(i + 1).getCells().get(0).addParagraph(); paragraph.setText(latexMathCode2[i]); //Insert symbols to the second column of the table paragraph = table3.getRows().get(i + 1).getCells().get(1).addParagraph(); officeMath = new OfficeMath(doc); officeMath.fromLatexMathCode(latexMathCode2[i]); paragraph.getItems().add(officeMath); } //Save the document String result = "addMathEquationAndSymbols.docx"; doc.saveToFile(result, FileFormat.Docx_2013); } }
Output: