Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Sun Nov 26, 2023 11:46 pm

Hello!
I'm writing an Java swing/awt application for my course project on Java and currently using free package
I have document looking roughly like that

Theme 1
1. question
2. Question
3. Question
...
40. Question
theme 2
...
50. Question

I need to find those themes and extract all paragraphs between/sheet them since I'll be randomising them to create exams ticket

I've found some useful links like finding some string or style (header 1,2,3...) or how to set bold in text but I wonder, is there a way to find just a bold text?

Also, if helper would come with more effective idea as to searching themes and separating by saving indexes thanks to larger experience working with docs through Java I'd gladly listen

OtcheRme
 
Posts: 2
Joined: Sun Nov 26, 2023 11:32 pm

Mon Nov 27, 2023 8:55 am

Hi,

Thank you for your inquiry.
I have implemented your two requirements through the latest commercial version of Spire.Doc for Java.
For your first requirement, you can refer to the following code to find these themes and extract all the paragraphs between them.
Code: Select all
         // Create a new Document object
        Document doc1 = new Document();
        doc1.loadFromFile("test.docx");

        // Create another new Document object
        Document doc2 = new Document();

        // Add a new section to the second document
        Section section = doc2.addSection();

        // Initialize start and end index variables
        int startindex = 0;
        int endindex = 0;

        // Iterate through each section in the first document
        for (Object sectionObj : doc1.getSections()) {
            Section sec = (Section) sectionObj;

            // Iterate through each paragraph in the current section
            for (Object paragraphObj : sec.getParagraphs()) {
                Paragraph paragraph = (Paragraph) paragraphObj;

                // Check if the paragraph contains "Theme 1"
                if (paragraph.getText().contains("Theme 1")) {

                    // Set the start index to the current paragraph index
                    startindex = sec.getBody().getParagraphs().indexOf(paragraph);
                    System.out.println(startindex);
                }
                // Check if the paragraph contains "Theme 2"
                if (paragraph.getText().contains("Theme 2")) {

                    // Set the end index to the current paragraph index
                    endindex = sec.getBody().getParagraphs().indexOf(paragraph);
                    System.out.println(endindex);
                }
            }
            // Call the method to extract paragraph styles between start and end index
            ExtractBetweenTheme(doc1, doc2, startindex, endindex);

            // Save the modified second document to a file
            doc2.saveToFile("betweenTheme.docx", FileFormat.Docx_2013);
        }
    }
    private static void ExtractBetweenTheme(Document doc1, Document doc2, int startindex, int endindex) {
        // Iterate through each paragraph index from startindex+1 to endindex-1
        for (int i = (startindex+1); (i < endindex); i++) {

            // Deep clone the child object at the current paragraph index from the first document
            DocumentObject doobj = doc1.getSections().get(0).getBody().getChildObjects().get(i).deepClone();

            // Add the cloned object to the child objects of the first section in the second document
            doc2.getSections().get(0).getBody().getChildObjects().add(doobj);
        }

For your second requirement, you can refer to the following code to find bold text.
Code: Select all
        // Create an input document object
        Document doc = new Document();

        // Load the content from the input.docx file
        doc.loadFromFile("data/input.docx");

        // Create an output document object
        Document doc1 = new Document();

        // Add a new section to the new document object
        Section section = doc1.addSection();

        // Iterate through each section in the original document object
        for (int i = 0; i < doc.getSections().getCount(); i++) {
            // Get the current section
            Section sec = doc.getSections().get(i);
           
            // Iterate through each paragraph in the current section
            for (int p = 0; p < sec.getParagraphs().getCount(); p++) {
                // Get the current paragraph
                Paragraph para = sec.getParagraphs().get(p);
               
                // Iterate through each child object in the current paragraph
                for (int j = 0; j < para.getChildObjects().getCount(); j++) {
                   
                    // Check if the child object is an instance of TextRange
                    if (para.getChildObjects().get(j) instanceof TextRange) {
                        // Cast the child object to TextRange
                        TextRange range = (TextRange)para.getChildObjects().get(j);
                       
                        // Check if the text in the TextRange is bold
                        if (range.getCharacterFormat().getBold() == true) {
                            // Print the bolded text of this paragraph
                            System.out.println("The bolded text of this paragraph is :"+range.getText());
                           
                            // Add the bolded text to the new section in the new document object
                            section.addParagraph().appendText(range.getText());
                        }
                    }
                }
            }
        }
        // Save the content of the new document object to the output/result1.docx file
        doc1.saveToFile("result.docx", FileFormat.Docx);

If the issue still exists, please offer your input file to help us do further investigation, you can attach here or send it to us via email ([email protected] ). Thank you in advance.

Sincerely,
Ula
E-iceblue support team
User avatar

Ula.wang
 
Posts: 282
Joined: Mon Aug 07, 2023 1:38 am

Sun Dec 03, 2023 9:28 pm

Hello
Thanks for your answer, it definitely enlightened me about word structure and spire methods.
It was really interesting and helpful (for now though I'll stick with simple string search and getText method since I'm afraid of minor problems occurring while I don't have much time, but I'll be sure to use your library when I'd ever need to work with docx again and I'm sure others will find this article helpful^^)

Have a great day

OtcheRme
 
Posts: 2
Joined: Sun Nov 26, 2023 11:32 pm

Mon Dec 04, 2023 7:06 am

Hi,

Thank you for your feedback.
If you have any question, just feel free to write back. I hope you have a wonderful day.

Sincerely,
Ula
E-iceblue support team
User avatar

Ula.wang
 
Posts: 282
Joined: Mon Aug 07, 2023 1:38 am

Return to Spire.Doc