Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Mon Nov 19, 2018 8:23 am

I have an issue extracting text from a PDF that has been scanned and OCR'd in Greece. When I call ExtractText, the string returned includes a bunch of undisplayable characters meaning that although I can cut and paste the contents from Adobe Reader, I can't get at from within my software.

Is there a way to specify the codepage or encoding of the string returned by ExtractText or do you have any alternate advice?

Many thanks in advance,

Darren.

DarrenWray
 
Posts: 8
Joined: Mon May 07, 2018 1:40 pm

Mon Nov 19, 2018 9:02 am

Hi Darren Wray,

Thank you for your letter.
Spire.PDF takes the Unicode as the default encoding. At present, it does not support specifying the encoding.
Anyway, could you please share your sample PDF document to help us look into it?

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Mon Nov 19, 2018 2:06 pm

An example page is attached. I can cut and paste from this document in both Adobe Reader (windows and mac) and the mac preview app, but as shown in the screenshot in Example.zip the values returned are unintelligible.

Thanks in advance,

Darren.

DarrenWray
 
Posts: 8
Joined: Mon May 07, 2018 1:40 pm

Tue Nov 20, 2018 1:50 am

Dear Darren,

Thank you for providing the sample.
When I copy the content directly in Adobe or extract text in Adobe, the content is still incorrect. Just see the space between the characters.
Adobe.jpg


Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Tue Nov 20, 2018 7:01 am

Actually, the text in your screenshot is pretty accurate - please see the attached showing your screenshot and the PDF side by side.

Side by Side.png


OCR quality aside, is there a way to get the text from this PDF without what appears to be codepage errors?

I'm not sure if it is related but when I look at the fonts used in the PDF, the encoding is set to custom, this seems to be a common trait amongst the PDFs that I can't process.

Any and all help appreciated,

Darren.

DarrenWray
 
Posts: 8
Joined: Mon May 07, 2018 1:40 pm

Tue Nov 20, 2018 8:13 am

Dear Darren,

After further investigation, we found that you are right.
I have logged the issue in our bug tracking system with a high priority and our dev team is now investigating the issue.
Once there's an update, I will inform you.
Sorry for the inconvenience caused.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Tue Nov 20, 2018 10:08 am

Thanks Jane, that is perfect 8)

DarrenWray
 
Posts: 8
Joined: Mon May 07, 2018 1:40 pm

Wed Nov 28, 2018 5:49 am

Dear Darren,

Thanks for your patient waiting.
Glad to inform that your issue has been resolved. Please download the hotfix from the following link.
https://www.e-iceblue.com/downloads/Tem ... 3.11.6.zip

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Sat Dec 01, 2018 11:39 pm

Thank you - I'll download and implement.

DarrenWray
 
Posts: 8
Joined: Mon May 07, 2018 1:40 pm

Mon Dec 03, 2018 2:11 am

Hi Darren,

Thank you for your quick response.
I will look forward to your reply.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Fri Dec 07, 2018 8:38 am

Hi Darren,

Greetings from e-iceblue!
Have you tried the hotfix?
Your feedback would be greatly appreciated!

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Wed Apr 10, 2019 3:11 pm

I have the same problem. Could resend the version that you have uploaded before?

adch
 
Posts: 5
Joined: Mon Mar 11, 2019 7:16 pm

Thu Apr 11, 2019 1:44 am

Hello Adch,

Thank you for contacting.
Please download and test our latest Spire.PDF Pack(Hot Fix) Version:5.4.1 which includes all the fixes and new features. If your problem still occurs after trying, please provide your input file, full testing code as well as your output file to help us further look into it. You could send them to us via email (support@e-iceblue.com).

Sincerely,
Lisa
E-iceblue support team
User avatar

Lisa.Li
 
Posts: 1261
Joined: Wed Apr 25, 2018 3:20 am

Wed Apr 17, 2019 7:02 am

Hello Adch,

Greetings from E-iceblue.
Did the latest version work for you? Thanks in advance for your feedback and time.

Sincerely,
Lisa
E-icelue support team
User avatar

Lisa.Li
 
Posts: 1261
Joined: Wed Apr 25, 2018 3:20 am

Fri Apr 19, 2019 7:17 pm

Hi there, i may have posted into the wrong place :D I am using the Spire for a java app so i need the jar file. Could you help with it?

Thanks in advance

adch
 
Posts: 5
Joined: Mon Mar 11, 2019 7:16 pm

Return to Spire.PDF