Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Wed Nov 22, 2017 1:32 pm

i am trying to convert docx in to text , while i am using SaveToFile method of spire for converting doc in to text .
in one place sequence of text getting mismatch in to text file.

Can some one help me on this , why it is happening ?, at the same time rest of the docx to text conversion happening properly.

Thanks in advance.

ParasXOR
 
Posts: 28
Joined: Wed Nov 22, 2017 6:13 am

Thu Nov 23, 2017 1:41 am

Hello,

Thanks for your post. To help us investigate the issue, would you please share your docx file and point out which place of text is getting mismatch?

Best regards,
Simon
E-iceblue support team
User avatar

Simon.yang
 
Posts: 620
Joined: Wed Jan 11, 2017 2:03 am

Thu Nov 23, 2017 6:25 am

Please check highlighted area of file.

Text out put :
Art. 13 Scope
Section 4: (repealed by the Law of 10 November 2009) “Chapter 2: “Authorisation of PFS”76…”77
Section 1: General provisions”78


i am getting above text output.
Here Art 13 should come after Section 1: General Provisions in text.

ParasXOR
 
Posts: 28
Joined: Wed Nov 22, 2017 6:13 am

Thu Nov 23, 2017 8:01 am

Hello,

Thanks for your sharing. With further investigation, we find that there is column break in the section 2. The "Art. 13 Scope" is in the left column and the part of "Section 4:..." comes after it in the right column. So the result text is correct. Hope I have made it clear.

Best regards,
Simon
E-iceblue support team
User avatar

Simon.yang
 
Posts: 620
Joined: Wed Jan 11, 2017 2:03 am

Fri Nov 24, 2017 9:31 am

Hello,

Greeting from E-iceblue.
Did I explain it clearly?
We will appreciate it if you could give us some feedback.

Best regards,
Simon
E-iceblue support team
User avatar

Simon.yang
 
Posts: 620
Joined: Wed Jan 11, 2017 2:03 am

Thu Dec 14, 2017 7:53 am

Please find attached file while converting from DOC to text, its sequence getting mismatch

Please check :Art. 68’s contents – point 7 to 12 comes in Art 69.
Part IIa and Art.76, Art.77 sequence not in proper format

Please check above portion of file.

Thanks

ParasXOR
 
Posts: 28
Joined: Wed Nov 22, 2017 6:13 am

Thu Dec 14, 2017 9:50 am

Hello,

Thanks for your feedback.
Actually, Spire.Doc extracts text in order of document's internal storage sequence instead of display sequence which follows the same rule as MS Word. If there is a column in Word page, the internal storage sequence is from left to right.
About the Word file you shared this time is the same as the one you shared before, there are columns in the pages which get the wrong sequence text. Such as Art 69 in page 21, the "Art 69" is in the left column and left part comes after it in the right column.

Best regards,
Simon
E-iceblue support team
User avatar

Simon.yang
 
Posts: 620
Joined: Wed Jan 11, 2017 2:03 am

Mon Jan 29, 2018 11:43 am

Page number 6-7 of doc file.
Text under paragraphs (b), (d) and (e) is not captured in converted text file.
Please find the attached doc file.
TOC, Table and strikethrough removing while converting to text file,

ParasXOR
 
Posts: 28
Joined: Wed Nov 22, 2017 6:13 am

Tue Jan 30, 2018 3:49 am

Hi ParasXOR,

Thanks for your inquiry.
After an initial test with the latest version(Spire.Doc Pack(hot fix) Version:6.1.17), I didn't seem to notice the issue you mentioned like"Text under paragraphs (b), (d) and (e) is not captured in the converted text file." If I missed something, please show a result file or screenshot to demonstrate the problem. As for the lost of the table, sorry that our Spire.Doc is based on the MS Word standard. If you save the document as a .txt file in Word, you would see that there's no table at all, even the table content has been removed. For our customer's convenience, our Spire.Doc has maintained the table content. In regards to the Toc and striketrough cases, I didn't find the related part in your document. please check twice. In addition, kindly note the toc effect is impossible in the .txt file, only the text content can be converted.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Tue Jan 30, 2018 6:51 am

by using spire library, we are removing TOC,Table etc. while we converting from DOC to text.
so verify by removing those portion.

ParasXOR
 
Posts: 28
Joined: Wed Nov 22, 2017 6:13 am

Tue Jan 30, 2018 7:39 am

Hi ParasXOR,

I'm sorry for that.
If you have any problem related to our product in the future, please come back to us.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Return to Spire.Doc