Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Thu Jun 09, 2022 11:26 am

I would like to search within a document for two text marks, for example "<start>" and "<end>" and duplicate all the content between them (paragraphs, tables, images, etc.) right after. The content could even contain multiple pages.
I know how to do the search with findAllPattern, but not how to duplicate all the content between both marks.

I am using java.

Thanks.

jmparada
 
Posts: 47
Joined: Wed May 25, 2022 7:50 am

Fri Jun 10, 2022 5:55 am

Hello,

To duplicate all the content between both marks, we can use bookmarks. Here is an example solution for your reference. Please give it a try. Just get back to us if there is any question.
Code: Select all
        Document doc = new Document();
        doc.loadFromFile("test.docx");

        //Find keywords
        TextSelection startSelection = doc.findString("<start>", false, true);
        TextSelection endSelection = doc.findString("<end>", false, true);
        TextRange startRange = startSelection.getAsOneRange();
        TextRange endRange = endSelection.getAsOneRange();
        Paragraph startPara = startRange.getOwnerParagraph();
        Paragraph endPara = endRange.getOwnerParagraph();
        int startIndex = startPara.getChildObjects().indexOf(startRange);
        int endIndex = endPara.getChildObjects().indexOf(endRange);
        //Add a bookmark to include the content between them
        //Create bookmark tags
        BookmarkStart bookmarkStart = new BookmarkStart(doc, "test_bookmark");
        BookmarkEnd bookmarkEnd = new BookmarkEnd(doc, "test_bookmark");
        //Insert bookmark
        if (startIndex == startPara.getChildObjects().getCount() - 1) {
            startPara.getChildObjects().add(bookmarkStart);
        } else {
            startPara.getChildObjects().insert(startIndex + 1, bookmarkStart);
        }
        endPara.getChildObjects().insert(endIndex, bookmarkEnd);

        //Get bookmark content
        BookmarksNavigator navigator = new BookmarksNavigator(doc);
        navigator.moveToBookmark("test_bookmark");

        TextBodyPart body = navigator.getBookmarkContent();
        //Now you can get all objects between the keywords from body.getBodyItems()
        for (int i = 0; i < body.getBodyItems().getCount(); i++) {
            DocumentObject obj=body.getBodyItems().get(i);
            //Paragraph
            if(obj.getDocumentObjectType().equals(DocumentObjectType.Paragraph)){
                //.....
            }
            //Other elements
        }

        //Remove the bookmark
        startPara.getChildObjects().remove(bookmarkStart);
        endPara.getChildObjects().remove(bookmarkEnd);
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Fri Jun 10, 2022 8:23 am

I appreciate the explanation and code provided. I would need to extend that code with the rest of the operations. I don't know what types of objects there are or how to duplicate and place them.
Thanks.

jmparada
 
Posts: 47
Joined: Wed May 25, 2022 7:50 am

Fri Jun 10, 2022 8:47 am

You can use DocumentObject.deepClone() to get a copy of the element(such as Paragraph) and add this copy to other place. For example:
Code: Select all
        for (int i = 0; i < body.getBodyItems().getCount(); i++) {
            DocumentObject obj=body.getBodyItems().get(i);
            //Paragraph
            if(obj.getDocumentObjectType().equals(DocumentObjectType.Paragraph)){
                //.....
                DocumentObject copyPara=obj.deepClone();
                //Add this copy into other place
                //Such as the end of the document
                doc.getLastSection().getBody().getChildObjects().add(copyPara);
            }
            //Other elements
        }

On the other hand, you can find all types of document objects in the enum "DocumentObjectType" . Then select the types you need to judge and use if or switch to judge DocumentObject.getDocumentObjectType() and perform subsequent processing.
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Wed Jun 15, 2022 12:12 pm

The code seems to work fine. However I need to add the duplicate content from the <end> position, not from the end of the document. It is also possible that it is copied multiple times.

Example1 (repeat 3 times):
Prueba de texto <start> 12345 <end> .....

Result:
Prueba de texto 12345 12345 12345 .....

Example2 (repeat 2 times):
prueba de texto
<start>
12345
<end>
.....

Result:
prueba de texto
12345
12345
.....

jmparada
 
Posts: 47
Joined: Wed May 25, 2022 7:50 am

Thu Jun 16, 2022 8:19 am

Hi,

Please try the following modified code snippet.

Code: Select all
        TextBodyPart body = navigator.getBookmarkContent();
        TextBodyPart newBody = new TextBodyPart(doc);
        //Now you can get all objects between the keywords from body.getBodyItems()
        //Repeat 3 times
        for (int k = 1; k <= 3; k++) {
            //Clone all content to newBody
            for (int i = 0; i < body.getBodyItems().getCount(); i++) {
                DocumentObject obj = body.getBodyItems().get(i);
                newBody.getBodyItems().add(obj.deepClone());
            }
        }
        //replace the whole body
        navigator.replaceBookmarkContent(newBody);
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Fri Jun 17, 2022 1:22 pm

I have tested the code and it works fine for the test case. However, I have found other additional problems.

1.- It seems that the cloned body includes an additional paragraph break because in the result, between each duplicated block, there is one more paragraph break.

2.- The duplication of a block may be within the same paragraph, for example "text <start> 12345 <end> text". In this case it doesn't work for me. I've also tried "<start> 12345 <end>", "<start> 12345 <end> text", "text <start> 12345 <end>" and so on.

3.- Another curious case is that if I use the example "text <start> 12345 <end>", that is, with several blank spaces around "12345", the start mark puts it correctly, right at the end of "<start> " but the end mark puts it after the "5" and therefore does not take the spaces that remain until the beginning of "<end>.

Thank you very much for your assistance.

jmparada
 
Posts: 47
Joined: Wed May 25, 2022 7:50 am

Mon Jun 20, 2022 7:51 am

I can verify the behaviour number 1:
It seems that the cloned body includes an additional paragraph break because in the result, between each duplicated block, there is one more paragraph break


{{ repeaterStart }}
Info
Some more info
{{ repeaterEnd }}


results in:

Info
Some more info


Info
Some more info


Note the two line breaks in between. I think one of them is from the placeholder and one of them is added for some reason

GM_SpireUser
 
Posts: 1
Joined: Mon Jun 20, 2022 7:23 am

Mon Jun 20, 2022 6:30 pm

I have found a way to solve practically everything. I enclose the document "TestWordForos.docx" that contains different content tests to be duplicated. In the same paragraph, a single paragraph, with several paragraphs and with several paragraphs of various contents.

To solve the problems, I have had to differentiate the way of duplicating depending on whether all the content to be duplicated is in the same paragraph or in several. I have also had to vary the position of the bookmarks to avoid the problem of the additional line breaks, which I don't know why they are placed.

The only problem is when the content to be duplicated is a single table, since in this case I have not been able to place the Bookmark, leaving an error indicating that the bookmark cannot be placed in a table. If there is no other way, I will have to have that deficiency.

I am attaching the resulting word document and java code. I wish you could see the code in case there is something that is not right or needs to be changed.

Thank you

jmparada
 
Posts: 47
Joined: Wed May 25, 2022 7:50 am

Tue Jun 21, 2022 8:11 am

Hi,

Thanks for your efforts on this issue.

I'm thinking of a way that satisfies all the situations you described. Just in case you also shared your code and ideas, I will try it and try to improve everything. Thanks again for your continued efforts.
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Wed Jun 22, 2022 8:29 am

Hi,

I found a solution for this case. I attached my full test code, explanations, input file and output file for your reference. If there is still something in the code that isn't taken into account, please point it out. Thank you for your patience and assistance.
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Thu Jun 23, 2022 3:45 pm

I have verified that your code is better than mine and as you say, I understand that it covers all the options better.
In my tests I have gotten an error, which also happened with my code. This is when start and end are in different paragraphs, but each by itself is not a complete paragraph either.
For example:

<start> texto1 texto2
texto3 texto4
texto5 <end>

Additionally, to cover my complete need, I had to add something to the duplication that indicated that a new line began, for example in my case I have put "<row>" in such a way that according to the previous example, the result would be with 3 repetitions:

<row> text1 texto2
texto3 texto4
texto5
<row> text1 texto2
texto3 texto4
texto5
<row> text1 texto2
texto3 texto4
texto5

I haven't been able to do this in your code.

If you can look at these two problems for me, I would appreciate it.

jmparada
 
Posts: 47
Joined: Wed May 25, 2022 7:50 am

Thu Jun 23, 2022 6:29 pm

Testing more, I have seen that the same problem also occurs when there are section changes. From what I see as the code is made, first what can be duplicated from "start", then what can be from "end" and then what is between the two. This causes the duplicate to not be in the same order as the original.
In the example above:

<start> texto1 texto2
texto3 texto4
texto5 <end>

Whether or not there are different sections, the result would be (two repetitions):

texto1 texto2 texto1 texto2
texto3 texto4
texto3 texto4
texto5 texto5

when should it be

texto1 texto2
texto3 texto4
texto5
texto1 texto2
texto3 texto4
texto5

jmparada
 
Posts: 47
Joined: Wed May 25, 2022 7:50 am

Mon Jun 27, 2022 6:38 am

Hi,

Thanks for your efforts.

I'll try it and get back to you soon.
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Fri Jul 01, 2022 8:41 am

Hi,

Thanks for patience and sorry for the late reply.

Your consideration is very rigorous and meticulous, which is very good. While the ideas you presented was challenging for me, I always had the pleasure of trying it out.
For the cases you mentioned last time, I have refined and improved the previous code and added comments where important.
I tried all the cases you mentioned before and where <start> and <end> are in different sections. They all work fine. But I think you might find something I overlooked.
Please test again. Looking forward to your feedback.
Sincerely,
Andy
E-iceblue support team
User avatar

Andy.Zhou
 
Posts: 483
Joined: Mon Mar 29, 2021 3:03 am

Return to Spire.Doc