Removing text between two keywords

Technical support for Spire.Doc

Moderator: iceblue support

Removing text between two keywords

Postby mystcreater » Thu Jun 15, 2017 3:14 pm

HI,

I have a document Word where the user can enter a specific keyword at two places and I want to be able with Spire to remove the entire text between these two keywords.

As example:
Code: Select all
bla bla bla
{{Test}}
bla bla bla bla
{{Test}}
bla bla bla


In this example, I want to be able to remove everything that is inside the two {{Test}} keywords in some cases and in other cases just remove the keywords themselves.

Note that I don't know in advance what the user can typed between the two keywords.

How can I do that?

Thank you.
mystcreater
 
Posts: 6
Joined: Tue Sep 29, 2015 12:10 am

Re: Removing text between two keywords

Postby Jane.Bai » Fri Jun 16, 2017 7:22 am

Hello,

Thanks for your inquiry.
Please use the follwing code:
Code: Select all
 private void button10_Click(object sender, EventArgs e)
        {
            Document doc = new Document();
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\10853.docx");
            TextSelection[] ts=doc.FindAllPattern(new Regex("{{test}}"));
            DocumentObject start = ts[0].GetAsOneRange();
            DocumentObject end = ts[1].GetAsOneRange();
            RemoveRange(start, end);
            doc.SaveToFile(@"C:\Users\Administrator\Desktop\10853result.docx", FileFormat.Docx);
        }
public static void RemoveRange(DocumentObject start, DocumentObject end)
        {
            HashSet<DocumentObject> endElements = new HashSet<DocumentObject>();
            DocumentObject parent = end;
            while (parent != null)
            {
                endElements.Add(parent);
                parent = parent.Owner;
            }

            parent = start.Owner;
            DocumentObject current = start;
            DocumentObject lastStart = start;
            while (parent != null)
            {
                ICompositeObject container = (parent as ICompositeObject);
                DocumentObjectCollection objs = container.ChildObjects;
                int index = objs.IndexOf(current) + 1;
                while (objs.Count > index)
                {
                    DocumentObject element = objs[index];
                    if (endElements.Contains(element))
                    {
                        parent = null;
                        lastStart = current;
                        break;
                    }
                    objs.RemoveAt(index);
                }

                if (parent != null)
                {
                    if (parent.DocumentObjectType == DocumentObjectType.Body)
                    {
                        lastStart = parent.Owner;
                        break;
                    }
                    current = parent;
                    parent = parent.Owner;
                }
            }

            parent = end.Owner;
            current = end;
            while (parent != null)
            {
                ICompositeObject container = (parent as ICompositeObject);
                DocumentObjectCollection objs = container.ChildObjects;
                int index = objs.IndexOf(current) - 1;
                while (index >= 0)
                {
                    DocumentObject element = objs[index];
                    if (lastStart == element)
                    {
                        parent = null;
                        break;
                    }
                    objs.RemoveAt(index);
                    index--;
                }

                if (parent != null)
                {
                    current = parent;
                    parent = parent.Owner;
                }
            }
        }


If there's still any issue, welcome to write back.

Sincerely,
Jane
E-iceblue support team
User avatar
Jane.Bai
 
Posts: 344
Joined: Tue Nov 29, 2016 1:47 am

Re: Removing text between two keywords

Postby Jane.Bai » Mon Jun 19, 2017 6:24 am

Hello,

Have you tried the solution I provided? How is your issue now? Could you please give us come feedback at your convenience?

Sincerely,
Jane
E-iceblue support team
User avatar
Jane.Bai
 
Posts: 344
Joined: Tue Nov 29, 2016 1:47 am

Re: Removing text between two keywords

Postby mystcreater » Mon Jun 19, 2017 2:12 pm

Hi,

Thanks for the answer but it's not working.

I tried but it removes the content inside the keywords but not the keywords themselves.

As example, the result using your code is:

Code: Select all
bla bla bla
{{Test}}
{{Test}}
bla bla bla


Intead of

Code: Select all
bla bla bla
bla bla bla


Also, I suspect the algorithm is not working when I have multiple ranges to delete.

I'm wondering why you don't you have some built-in feature in the API to do this type of replacing?
mystcreater
 
Posts: 6
Joined: Tue Sep 29, 2015 12:10 am

Re: Removing text between two keywords

Postby Jane.Bai » Tue Jun 20, 2017 9:32 am

Hello,

Thanks for your response.
Please use the following code to delete keywords themselves and the text between them.
Code: Select all
 
 private void button10_Click(object sender, EventArgs e)
        {
            Document doc = new Document();
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\10853.docx");
            TextSelection[] ts=doc.FindAllPattern(new Regex("{{test}}"));

            DocumentObject start = ts[0].GetAsOneRange();
            DocumentObject end = ts[1].GetAsOneRange();
            RemoveRange(start, end);

            //just delete the key words themselves.
            start.Owner.ChildObjects.Remove(start);
            end.Owner.ChildObjects.Remove(end);
            doc.SaveToFile(@"C:\Users\Administrator\Desktop\10853result.docx", FileFormat.Docx);
     }
  public static void RemoveRange(DocumentObject start, DocumentObject end)
        {
            HashSet<DocumentObject> endElements = new HashSet<DocumentObject>();
            DocumentObject parent = end;
            while (parent != null)
            {
                endElements.Add(parent);
                parent = parent.Owner;
            }

            parent = start.Owner;
            DocumentObject current = start;
            DocumentObject lastStart = start;
            while (parent != null)
            {
                ICompositeObject container = (parent as ICompositeObject);
                DocumentObjectCollection objs = container.ChildObjects;
                int index = objs.IndexOf(current) + 1;
                while (objs.Count > index)
                {
                    DocumentObject element = objs[index];
                    if (endElements.Contains(element))
                    {
                        parent = null;
                        lastStart = current;
                        break;
                    }
                    objs.RemoveAt(index);
                }

                if (parent != null)
                {
                    if (parent.DocumentObjectType == DocumentObjectType.Body)
                    {
                        lastStart = parent.Owner;
                        break;
                    }
                    current = parent;
                    parent = parent.Owner;
                }
            }

            parent = end.Owner;
            current = end;
            while (parent != null)
            {
                ICompositeObject container = (parent as ICompositeObject);
                DocumentObjectCollection objs = container.ChildObjects;
                int index = objs.IndexOf(current) - 1;
                while (index >= 0)
                {
                    DocumentObject element = objs[index];
                    if (lastStart == element)
                    {
                        parent = null;
                        break;
                    }
                    objs.RemoveAt(index);
                    index--;
                }

                if (parent != null)
                {
                    current = parent;
                    parent = parent.Owner;
                }
            }
        }

As for the multiple range issue, I can successfully delete the ranges between the two key words, do you mean multiple keywords or something else? Please share more details, then we will provide the solution accordingly.
Besides, we will consider making the RemoveRange function as a built-in feature.

Sincerely,
Jane
E-iceblue support team
User avatar
Jane.Bai
 
Posts: 344
Joined: Tue Nov 29, 2016 1:47 am

Re: Removing text between two keywords

Postby mystcreater » Wed Jun 21, 2017 6:30 pm

Sorry for my long time to reply. I just forgot to check the checkbox to receive a notification when a reply is posted and this option was false by default in my account. May I suggest that you set this option to true by default to a new user? It's too easy to forget to go back to your website just to be sure that someone replied.

Your solution almost perfect... (See my last question at the end of this reply)

When I was talking about some bug that I have with your first answer when I have multiple ranges, that was not finally a bug but just because the keywords was not removed, the algorithm seems to do an infinite loop which is not anymore the case while removing the keywords themselves.

Just to let you know what I mean by multiple ranges, here is an example that is now working with your second answer:

Code: Select all
{{Section1}}
Introduction
{{Section1}}
Content
{{Section2}}
Conclusion
{{Section2}}


Yes, I would be happy in the future to have this replace feature into your library but it's not anymore something that irritating me now. :)

However, I have a last question about the replacement:
I want to be able to remove the line itself where the keyword is located.

If I follow the previous example, here is the result after using your code:
Code: Select all
Introduction

Content

Conclusion


But I would like something like that:
Code: Select all
Introduction
Content
Conclusion


Thank you again for your great support!
mystcreater
 
Posts: 6
Joined: Tue Sep 29, 2015 12:10 am

Re: Removing text between two keywords

Postby mystcreater » Wed Jun 21, 2017 6:47 pm

While searching for the examples on your website, I found this post:
https://www.e-iceblue.com/Tutorials/Spire.Doc/Spire.Doc-Program-Guide/Paragraph/How-to-remove-empty-lines-from-the-word-document-in-C.html

From there, I tried this solution and it works:
Code: Select all
      private void RemoveRangeKeyword(TextRange range)
      {
         if (range.Text == range.OwnerParagraph.Text)
            range.Owner.Owner.ChildObjects.Remove(range.OwnerParagraph);
         else
            range.Owner.ChildObjects.Remove(range);
      }


As I understand, when the range.Text is the same as the owner Paragraph, it means that the range is alone on his line. In other cases, the range is joined with other text.
When I remove the owner paragraph, this is deleting the line. In the case, it's just removing the range itself.

Is my solution is good?

Is there any situations where range.OwnerParagraph return null even if range is not null?

Thanks
mystcreater
 
Posts: 6
Joined: Tue Sep 29, 2015 12:10 am

Re: Removing text between two keywords

Postby Jane.Bai » Thu Jun 22, 2017 2:45 am

Hello mystcreater,

Your solution seems good! Glad to hear that you have got your issue resolved.
Concerning your situation, if the text range is not null, the range.OwnerParagragh will not return null.
Moreover, we will adopt your suggestion on the reply notification. Thank you so much for the feedback.
Please feel free to contact us if you need any assistance.

Best wishes,
Jane
E-iceblue support team
User avatar
Jane.Bai
 
Posts: 344
Joined: Tue Nov 29, 2016 1:47 am

Re: Removing text between two keywords

Postby mystcreater » Thu Jun 22, 2017 1:23 pm

In fact, it's not true that the Range will never be null and I found a case yesterday.

If the Range was previously removed, the object will still not be null but when you will try to access the Owner property, it will be null.

You will ask me: But if you already removed it, why would you try to remove it again?

In fact, I have ranges which can be sub-ranges (ranges inside other ranges) and when you remove the parent range, the other ranges still are not null (of course) but their Owner property are null.

I have to do that just before trying to remove the range:

Code: Select all
         if (StartRange.Owner == null || StartRange.Owner.Owner == null)
            return;


Here is my full case:

Code: Select all
      if (StartRange.Owner == null || StartRange.Owner.Owner == null)
            return;

RemoveRange(StartRange, EndRange);

         if (range.OwnerParagraph != null && range.Text == range.OwnerParagraph.Text)
            range.Owner.Owner.ChildObjects.Remove(range.OwnerParagraph);
         else
            range.Owner.ChildObjects.Remove(range);


Again, I have to check both Owner or Owner.Owner in case where the range is alone on the line (Owner) or with other text (Owner.Owner)
mystcreater
 
Posts: 6
Joined: Tue Sep 29, 2015 12:10 am

Re: Removing text between two keywords

Postby Jane.Bai » Fri Jun 23, 2017 1:48 am

Hello mystcreater,

Thanks for sharing your precious view, and I believe this will be very helpful.
Just feel free to post if you need any help or if you have some special opinions.

Sincerely,
Jane
E-iceblue support team
User avatar
Jane.Bai
 
Posts: 344
Joined: Tue Nov 29, 2016 1:47 am


Return to Spire.Doc

Who is online

Users browsing this forum: No registered users and 0 guests