Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Fri Jun 12, 2015 10:31 pm

I'm trying to go through a Word document and obtain all of the text. I use the method from the tutorial where I go through each section and then obtain every paragraph:

foreach (Section section in document.Sections)
{
foreach (Paragraph paragraph in section.Paragraphs)
{
sb.AppendLine(paragraph.Text);
}
}

the problem lies in that paragraph.Text does not return the list number next to the paragraph. It skips the list number entirely, for example when I want the paragraph.Text to say "1. SCOPE" it just returns "SCOPE". How do I get the numbered list to appear alongside the paragraph?

robertop19.rp
 
Posts: 8
Joined: Mon Apr 27, 2015 10:36 pm

Mon Jun 15, 2015 5:58 am

Hello,

Sorry for late replay as weekend.
If you want to extract the text that includes the list number next to the paragraph, the only way as below is directly saving the document to txt file:
Code: Select all
document.SaveToFile("result.txt", FileFormat.Txt);

Kindly note: the conversion from docx to txt is only supported in Pro Edition. Please contact our sales team(sales@e-iceblue.com) to upgrade your license of the Spire.Doc standard Edition to Pro Edition.

Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy
 
Posts: 802
Joined: Mon Jan 19, 2015 6:14 am

Wed Jun 17, 2015 9:55 am

Hello,

Has your issue been resolved ? Could you please give us some feedback ?

Thanks,
Betsy
E-iceblue support team
User avatar

Betsy
 
Posts: 802
Joined: Mon Jan 19, 2015 6:14 am

Fri Jun 19, 2015 3:10 pm

Not exactly. This is what I did

Spire.Doc.Document document = new Spire.Doc.Document(docPath);
//try saving the file in .txt format in order to get the numbered lists
string textFileName = mainPath + @"result.txt";
document.SaveToFile(textFileName, FileFormat.Txt);
document.LoadFromFile(textFileName);
//and then I proceed to get the text from each paragraph in document

When I look at result.txt though this is a demonstration of what I get:

1. SCOPE

.1 Scope. This specification covers...<-------------------------------------------this should be "1.1 Scope..."

.2 Classification. The insect repellent covered by...<-------------------------this should be "1.2 Classification..."

do I have to change the encoding? Please tell me what I am doing wrong, thanks.

robertop19.rp
 
Posts: 8
Joined: Mon Apr 27, 2015 10:36 pm

Mon Jun 22, 2015 8:23 am

Hello,

Thanks for your feedback. I have noticed the issue you mentioned, and I have posted the issue to our Dev team, once there are any update, we will let you know immediately. Sorry for inconvenience.
Thanks,
Gary
E-iceblue support team
User avatar

Gary.zhang
 
Posts: 1380
Joined: Thu Apr 04, 2013 1:30 am

Thu Jul 02, 2015 2:29 am

Hello,

Thanks for your waiting. Now the issue has get resolved, and the newest version of Spire.Doc has been released, you can download Spire.Doc Pack Version:5.5 and try.
Please let us know if you have any questions.
Sincerely,
Gary
E-iceblue support team
User avatar

Gary.zhang
 
Posts: 1380
Joined: Thu Apr 04, 2013 1:30 am

Mon Jul 06, 2015 6:09 am

Hello,

Have you tried the latest version ? Has your issue been resolved ?

Thanks,
Betsy
E-iceblue support team
User avatar

Betsy
 
Posts: 802
Joined: Mon Jan 19, 2015 6:14 am

Mon Jul 06, 2015 4:36 pm

Yes, the text file produces the correct numbers but it's still inconvenient that I cannot get the text directly from the Word file. Can you add a feature that let's me get the list number directly off the word file?

robertop19.rp
 
Posts: 8
Joined: Mon Apr 27, 2015 10:36 pm

Tue Jul 07, 2015 2:43 am

Hello,

Thanks for your feedback. We have added this new feature into our schedule. Maybe due to other urgent things, it will need more time to finish. When it is done, we will let you know.

Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy
 
Posts: 802
Joined: Mon Jan 19, 2015 6:14 am

Thu Jul 23, 2015 3:14 am

Hello,

Thanks for your waiting. Now the newest hotfix of Spire.Doc Version:5.5.30 has been released. And the new feature has been done. Please use the below code and the version to have a try.
Code: Select all
   Document extractdoc = new Document("sample.docx");
            foreach (Section section in extractdoc.Sections)
            {
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    //use Paragraph.ListText to get the list number
                    string st = paragraph.ListText + paragraph.Text;
                }
            }

If there is any question, welcome to get it back to us.

Best Regards,
Betsy
E-iceblue support team
User avatar

Betsy
 
Posts: 802
Joined: Mon Jan 19, 2015 6:14 am

Fri Jul 31, 2015 4:42 pm

Thank you for your assistance. The program now works like I intended it to.

robertop19.rp
 
Posts: 8
Joined: Mon Apr 27, 2015 10:36 pm

Mon Aug 03, 2015 1:17 am

Hello,

Thanks for your response. Please feel free to contact us if you have any question or needs.

Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy
 
Posts: 802
Joined: Mon Jan 19, 2015 6:14 am

Return to Spire.Doc