Java: Split Word Documents

When you have a fairly long Word document that requires teamwork, it may be necessary to split the document into several shorter files and assign them to different people to speed up the workflow. Instead of manually cutting and pasting, this article will demonstrate how to programmatically split a Word document using Spire.Doc for Java .

Install Spire.Doc for Java

First, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>12.4.6</version>
    </dependency>
</dependencies>
    

Split a Word Document by Page Break

A Word document can contain multiple pages separated by page breaks. To split a Word document by page break, you can refer to the below steps and code.

  • Create a Document instance.
  • Load a sample Word document using Document.loadFromFile() method.
  • Create a new Word document and add a section to it.
  • Traverse through all body child objects of each section in the original document and determine whether the child object is a paragraph or a table.
  • If the child object of the section is a table, directly add it to the section of new document using Section.getBody().getChildObjects().add() method.
  • If the child object of the section is a paragraph, first add the paragraph object to the section of the new document. Then traverse through all child objects of the paragraph and determine whether the child object is a page break.
  • If the child object is a page break, get its index and then remove the page break from its paragraph by index.
  • Save the new Word document and then repeat the above processes.
  • Java
import com.spire.doc.*;
import com.spire.doc.documents.*;

public class splitDocByPageBreak {
    public static void main(String[] args) throws Exception {
        // Create a Document instance
        Document original = new Document();

        // Load a sample Word document
        original.loadFromFile("E:\\Files\\SplitByPageBreak.docx");

        // Create a new Word document and add a section to it
        Document newWord = new Document();
        Section section = newWord.addSection();
        int index = 0;

        //Traverse through all sections of original document
        for (int s = 0; s < original.getSections().getCount(); s++) {
            Section sec = original.getSections().get(s);

            //Traverse through all body child objects of each section.
            for (int c = 0; c < sec.getBody().getChildObjects().getCount(); c++) {
                DocumentObject obj = sec.getBody().getChildObjects().get(c);
                if (obj instanceof Paragraph) {
                    Paragraph para = (Paragraph) obj;
                    sec.cloneSectionPropertiesTo(section);

                    //Add paragraph object in original section into section of new document
                    section.getBody().getChildObjects().add(para.deepClone());
                    for (int i = 0; i < para.getChildObjects().getCount(); i++) {
                        DocumentObject parobj = para.getChildObjects().get(i);
                        if (parobj instanceof Break) {
                            Break break1 = (Break) parobj;
                            if (break1.getBreakType().equals(BreakType.Page_Break)) {

                                //Get the index of page break in paragraph
                                int indexId = para.getChildObjects().indexOf(parobj);

                                //Remove the page break from its paragraph
                                Paragraph newPara = (Paragraph) section.getBody().getLastParagraph();
                                newPara.getChildObjects().removeAt(indexId);

                                //Save the new Word document
                                newWord.saveToFile("output/result"+index+".docx", FileFormat.Docx);
                                index++;

                                //Create a new document and add a section
                                newWord = new Document();
                                section = newWord.addSection();

                                //Add paragraph object in original section into section of new document
                                section.getBody().getChildObjects().add(para.deepClone());
                                if (section.getParagraphs().get(0).getChildObjects().getCount() == 0) {

                                    //Remove the first blank paragraph
                                    section.getBody().getChildObjects().removeAt(0);
                                } else {

                                    //Remove the child objects before the page break
                                    while (indexId >= 0) {
                                        section.getParagraphs().get(0).getChildObjects().removeAt(indexId);
                                        indexId--;
                                    }
                                }
                            }
                        }
                    }
                }
                if (obj instanceof Table) {
                    //Add table object in original section into section of new document
                    section.getBody().getChildObjects().add(obj.deepClone());
                }
            }
        }

        //Save to file
        newWord.saveToFile("output/result"+index+".docx", FileFormat.Docx);
    }
}

Java: Split Word Documents

Split a Word Document by Section Break

In Word, a section is a part of a document that contains its own page formatting. For documents that contain multiple sections, Spire.Doc for .NET also supports splitting documents by section breaks. The detailed steps are as follows.

  • Create a Document instance.
  • Load a sample Word document using Document.LoadFromFile() method.
  • Define a new Word document object.
  • Traverse through all sections of the original Word document.
  • Clone each section of the original document using Section.deepClone() method.
  • Add the cloned section to the new document as a new section using Document.getSections().add() method.
  • Save the result document using Document.saveToFile() method.
  • Java
import com.spire.doc.*;
public class splitDocBySectionBreak {
    public static void main(String[] args) throws Exception {
        //Create Document instance
        Document document = new Document();

        //Load a sample Word document
        document.loadFromFile("E:\\Files\\SplitBySectionBreak.docx");

        //Define a new Word document object
        Document newWord;

        //Traverse through all sections of the original Word document
        for (int i = 0; i < document.getSections().getCount(); i++){
            newWord = new Document();

            //Clone each section of the original document and add it to the new document as new section
            newWord.getSections().add(document.getSections().get(i).deepClone());

            //Save the result document 
            newWord.saveToFile("Result/result"+i+".docx");
        }
    }
}

Java: Split Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.