OCR Feature

Tue Aug 22, 2017 2:58 pm

Hi,
i want to know that can we extract text from a pdf made from scanned image in spire.pdf's licenced version?.

Thanks and Regards
Rahul

Wed Aug 23, 2017 4:01 am

Hello,

Thanks for your inquiry.
Yes, our Spire.Pdf supports that feature. Please use the code below.

Code: Select all: PdfDocument doc = new PdfDocument(); doc.LoadFromFile(PdfFile); StringBuilder content = new StringBuilder(); foreach (PdfPageBase page in doc.Pages) { content.Append(page.ExtractText()); } String fileName = "TextFromPDF.txt"; File.WriteAllText(fileName, content.ToString());

Sincerely,
Jane
E-iceblue support team

Wed Aug 23, 2017 7:44 am

Hi,
I am trying that code but it is not extract code(like tesseract ocr) from pdf made from scanned images , it throw an error System.Drawing.Bitmap..ctor(Int32 width, Int32 height, PixelFormat format)

Wed Aug 23, 2017 8:15 am

Hello,

Thanks for your quick response.
To help us with a better investigation, could you please send your sample pdf file to us via email(support@e-iceblue.com)?

Sincerely,
Jane
E-iceblue support team

Wed Aug 23, 2017 8:23 am

I send you a mail with attached pdf at 'support@e-iceblue.com' from emailID rahulk4@chetu.com .
Please check it

Wed Aug 23, 2017 8:55 am

Hello,

Thanks for your letter.
After a further test, we found the OCR feature is not available at present. I made the mistake using a special pdf file.
I apologize for that.

Sincerely,
Jane
E-iceblue support team

Wed Aug 23, 2017 9:10 am

For some page of that PDF it works fine but in some pages it gives me error ( Image[] images = page.ExtractImages();) because of this i am not able to compress a PDF File using image comression in spire.pdf.
Please resolve that issue.

Wed Aug 23, 2017 9:30 am

Hello,

Thanks for your quick response.
If you are simply wanting to extract images, that is not related to the OCR feature, and it can be acheived.
I have done a test on your document with the latest hotfix(Spire.PDF Pack(Hot Fix) Version:3.9.285 ), everything worked well.
Please try to this version first. If the issue still exists on your side, please write back and share more details on the running environment.

Sincerely,
Jane
E-iceblue support team

Wed Aug 23, 2017 9:37 am

But we need the OCR feature on scanned PDF, how we can achieve this? Please suggest.

Wed Aug 23, 2017 9:46 am

latest hotfix(Spire.PDF Pack(Hot Fix) Version:3.9.285 ) also gives same error.

Wed Aug 23, 2017 10:15 am

Hello,

Thanks for your response.
Our Spire.Pdf does not support the OCR feature and there's no good suggestion in this respect.
As for the exception thrown by "ExtractImage()", we will dig into it and reply you ASAP.

Sincerely,
Jane
E-iceblue support team

Wed Sep 20, 2017 8:02 am

Hi Rahul Kumar,

So sorry for the late reply.
In regards to the exception thrown by "ExtractImage()", I have reproduced it and logged it in our bug system. Once there's any update, I will inform you.

Sincerely,
Jane
E-iceblue support team

Fri Oct 06, 2017 1:25 pm

Team are we having OCR support now? we procured license for PDF doc and PDF net just now.

Mon Oct 09, 2017 2:42 am

Dear yogeshmsharma,

Sorry for late reply as weekend.
Sorry that Spire.PDF doesn't support OCR at present. But we have added this new feature into our system, we will let you know once there is any progress. In addition, I am afraid it cannot be implemented in a short time due to its complexity.

Sincerely,
Betsy
E-iceblue support team

Wed Oct 11, 2017 1:01 pm

Thanks for reply. Any tentative deadlines like next month or next quarter

OCR Feature

Purchase

Partnership

Products

Corporation