Java: Extract Word Paragraphs that Use a Specific Style

Wednesday, 01 December 2021 08:11 Written by  support iceblue
Rate this item
(0 votes)

Every paragraph in a Word document uses a paragraph style, either intentionally or unintentionally. The paragraph style can be a built-in style, such as Heading 1 and Heading 2, or it can be a customized style. This article introduces how we can extract paragraphs that use a specific style by using Spire.Doc for Java.

The table below lists the style names in MS Word and their corresponding names in Spire.Doc. A very simple rule is that the style name returned by programming does not contain spaces.

Style Name in MS Word Style Name in Spire.Doc
Title Title
Subtitle Subtitle
Heading 1 Heading1
Heading 2 Heading2
Heading 3 Heading3
No Spacing NoSpacing
Quote Quote
Intense Quote IntenseQuote
List Paragraph ListParagraph
Normal Normal
Custom Name CustomName

Install Spire.Doc for Java

First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

  • Package Manager
<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>4.11.8</version>
    </dependency>
</dependencies>

Extract Paragraphs that Use a Specific Style

The style name of a specific paragraph can be obtained by Paragraph.getStyleName() method. If a paragraph’s style name is exactly what you want to query, you can get the paragraph content using Paragraph.getText() method. The following are the steps to extract paragraphs that use a specific style.

  • Load a sample Word document while initializing the Document object.
  • Loop through the sections in the document.
  • Get a specific paragraph from a certain section using Section.getParagraphs().get() method.
  • Get the paragraph's style name using Paragraph.getStyleName() method and determine if the style is "Heading 1".
  • If yes, extract the text of the paragraph using Paragraph.getText() method.
  • Java
import com.spire.doc.Document;
import com.spire.doc.documents.Paragraph;

public class GetParagraphByStyleName {
    public static void main(String[] args) {

        //Load a sample Word document while initializing the Document object
        Document doc = new Document("C:\\Users\\Administrator\\Desktop\\Styles.docx");

        //Declare a variable
        Paragraph paragraph;

        //Loop through the sections
        for (int i = 0; i < doc.getSections().getCount(); i++) {

            //Loop through the paragraphs of a specific section
            for (int j = 0; j < doc.getSections().get(i).getParagraphs().getCount(); j++) {

                //Get a specific paragraph
                paragraph = doc.getSections().get(i).getParagraphs().get(j);

                //Determine if the paragraph style is "Heading 1"
                if (paragraph.getStyleName().equals("Heading1")) {

                    //Get the text of the paragraph in "Heading 1"
                    System.out.println("Heading 1: " + paragraph.getText() + "\n");
                }

                //Determine if the paragraph style is "My Custom Style"
                if (paragraph.getStyleName().equals("MyCustomStyle")) {

                    //Get the text of the paragraph in "My Custom Style"
                    System.out.println("My Custom Style: " + paragraph.getText());
                }
            }
        }
    }
}

Java: Extract Word Paragraphs that Use a Specific Style

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Additional Info

  • tutorial_title:
Last modified on Wednesday, 01 December 2021 08:21