Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Wed Dec 09, 2020 1:34 pm

Hello dear E-ICE-BLUE,

we use Spire.Doc to extract text from word documents and it works quite nice. :D
Now we have customers that send us documents full of field and if condtitions. If we use the GetText() method, we see all this "if" stuff in the extract. If we copy and paste from word to a text editor, we get the correct text. What can we do to extract the final text ?
Thanks in advance and best regards
Friedhelm

fhellerhoff
 
Posts: 6
Joined: Mon Jan 17, 2011 2:12 pm

Thu Dec 10, 2020 7:13 am

Hello,

Thanks for your inquiry.
I simulated a word file and tested your scenario and did reproduce your issue. I have posted this issue to our Dev team with the ticket SPIREDOC-5320. If there is any update, we will let you know. Sorry for the inconvenience caused.
Besides, there is an alternative workaround for you, as shown below. You can give it a try.
Code: Select all
            StringBuilder sb = new StringBuilder();
            Document doc = new Document();
            doc.LoadFromFile("fieldandconditions.docx");
            Section section = doc.Sections[0];
            ParagraphCollection paragraphs = section.Body.Paragraphs;

            foreach (Paragraph paragraph in paragraphs) {
                DocumentObjectCollection childObjects = paragraph.ChildObjects;
                foreach (DocumentObject chobj in childObjects) {
                    if (chobj.DocumentObjectType is DocumentObjectType.Field)
                    {
                        Field field = chobj as Field;
                        sb.Append(field.FieldText);
                    }
                    else if (chobj.DocumentObjectType is DocumentObjectType.TextRange) {
                        TextRange textRange = chobj as TextRange;
                        sb.Append(textRange.Text);
                    }
                }
                sb.AppendLine();
            }
            File.WriteAllText("result.txt", sb.ToString());


Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Thu Dec 10, 2020 12:29 pm

Thank you very much. We will give it a try.
Btw, we use version 8.7.5 installed via NuGet.

fhellerhoff
 
Posts: 6
Joined: Mon Jan 17, 2011 2:12 pm

Fri Dec 11, 2020 2:44 am

Hello,

Thank you for sharing more information.
Once there is any progress on SPIREDOC-5320, I will inform you immediately.

Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Mon Dec 21, 2020 5:30 pm

Hello,

I tried the wordaround you posted - thanks for that.
Unfortunately, that does not produce the same text as we see it in a word file. As it is sensitive data, I am not allowed to paste the full document, but maybe this example helps.

That is what we see in the document if we open it on MS Word
Diagnose:
Frische BWK 8 Fraktur


If we activate the fields view, it looks like this (see 2020-12-21_18h27_39.png)

If I run the workaround code you posted, I get
Diagnose:
Frische BWK 8 FrakturFehler! Keine Dokumentvariable verfügbar.Fehler! Keine Dokumentvariable verfügbar.Fehler! Keine Dokumentvariable verfügbar.Fehler! Keine Dokumentvariable verfügbar.="Fehler! Keine Dokumentvariable verfügbar." "" "Fehler! Keine Dokumentvariable verfügbar.Fehler! Keine Dokumentvariable verfügbar." \* MERGEFORMAT



What can we do to get the correct field results?

Thanks in advance and best regards.

fhellerhoff
 
Posts: 6
Joined: Mon Jan 17, 2011 2:12 pm

Tue Dec 22, 2020 10:13 am

Hello,

Thanks for your feedback.
For this scenario, please refer to the following modified code. If there are any questions, just feel free to contact us.


Code: Select all
            StringBuilder sb = new StringBuilder();
            Document doc = new Document();
            doc.LoadFromFile("IFField.docx");
            Section section = doc.Sections[0];
            ParagraphCollection paragraphs = section.Body.Paragraphs;

            foreach (Paragraph paragraph in paragraphs)
            {
                DocumentObjectCollection childObjects = paragraph.ChildObjects;

                for (int i=0;i< childObjects.Count; i++)
                {
                    if (childObjects[i].DocumentObjectType == DocumentObjectType.Field)
                    {
                        Field field = childObjects[i] as Field;
                        sb.Append(field.FieldText);
                        while (++i < childObjects.Count)
                        {
                            DocumentObject tempobj = childObjects[i];
                            if (tempobj is FieldMark && (tempobj as FieldMark).Type == FieldMarkType.FieldEnd)
                            {
                                break;
                            }
                        }
                    }                 
                    else if (childObjects[i].DocumentObjectType == DocumentObjectType.TextRange)
                    {
                        TextRange textRange = childObjects[i] as TextRange;
                        string text = textRange.Text;
                        sb.Append(textRange.Text);
                    }

                }
                sb.AppendLine();
            }
            File.WriteAllText("result.txt", sb.ToString());


Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Fri Jan 08, 2021 9:49 am

Hello Brian,
happy new year! :)
Thanks for the modified code. The output looks better now, but I still need to hide the variables which seems to be not set/filled. In the screenshot, they appear as "Fehler!Keine Dokumentenvariable verfügbar."

2021-01-08_10h44_35.png


Any ideas for that? Can I somehow detect and remove them?

Thanks in advance and best regards, Friedhelm

fhellerhoff
 
Posts: 6
Joined: Mon Jan 17, 2011 2:12 pm

Fri Jan 08, 2021 11:17 am

Hello,

Thanks for your feedback.
Please refer to the following modified code. Attached are my test file and result file.
Code: Select all
            StringBuilder sb = new StringBuilder();
            Document doc = new Document();
            doc.LoadFromFile("IFField1.docx");
            Section section = doc.Sections[0];
            ParagraphCollection paragraphs = section.Body.Paragraphs;

            foreach (Paragraph paragraph in paragraphs)
            {
                DocumentObjectCollection childObjects = paragraph.ChildObjects;

                for (int i = 0; i < childObjects.Count; i++)
                {
                    if (childObjects[i].DocumentObjectType == DocumentObjectType.Field)
                    {
                        Field field = childObjects[i] as Field;
                        FieldMark fm = field.End;
                        sb.Append(field.FieldText);
                        while (++i < childObjects.Count)
                        {
                            DocumentObject tempobj = childObjects[i];
                            if (tempobj is FieldMark && (tempobj as FieldMark).Type == FieldMarkType.FieldEnd)
                            {
                                if ((tempobj as FieldMark) == fm)
                                {
                                    break;
                                }                             
                            }
                        }
                    }
                    else if (childObjects[i].DocumentObjectType == DocumentObjectType.TextRange)
                    {
                        TextRange textRange = childObjects[i] as TextRange;
                        string text = textRange.Text;
                        sb.Append(textRange.Text);
                    }

                }
                sb.AppendLine();
            }
            File.WriteAllText("result.txt", sb.ToString());

If there are any questions, please provide your source file to help us investigate further. Thanks in advance.

Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Mon Jan 11, 2021 4:49 pm

Hello Brian,

again thanks for update. It looks much better, but the document seems to have a lot of variables where field.value is just a couple of blanks.
Where can I fiend a detailed description of the Field class? I think I need to check more of the content of the fields.
Best regards.

fhellerhoff
 
Posts: 6
Joined: Mon Jan 17, 2011 2:12 pm

Tue Jan 12, 2021 9:51 am

Hello,

Thanks for your response.
This is the API docs about the fields for your reference: https://www.e-iceblue.com/Tutorials/API ... tream.html
If there are any other questions, please feel free to contact us.

Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Mon Apr 19, 2021 9:41 am

Hello,

Greetings from E-iceblue!
Glad to inform you that we just released Spire.Doc Pack(hot fix) Version:9.4.12 which fixes your issue SPIREDOC-5320, please download it from the following links to test on your side. Looking forward to your testing result.
Website link: https://www.e-iceblue.com/Download/down ... t-now.html
Nuget ling: https://www.nuget.org/packages/Spire.Doc/9.4.12

Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Sun Apr 25, 2021 6:15 am

Hello,

Greetings from E-iceblue.
Does this hotfix solve your issue? Could you please give us some feedback at your convenience?

Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Return to Spire.Doc