Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Tue Aug 22, 2017 2:58 pm

Hi,
i want to know that can we extract text from a pdf made from scanned image in spire.pdf's licenced version?.

Thanks and Regards
Rahul

rahulkkush@gmail.com
 
Posts: 8
Joined: Tue Aug 22, 2017 2:55 pm

Wed Aug 23, 2017 4:01 am

Hello,

Thanks for your inquiry.
Yes, our Spire.Pdf supports that feature. Please use the code below.
Code: Select all
PdfDocument doc = new PdfDocument();
            doc.LoadFromFile(PdfFile);
            StringBuilder content = new StringBuilder();
            foreach (PdfPageBase page in doc.Pages)
            {
                content.Append(page.ExtractText());
            }
            String fileName = "TextFromPDF.txt";
            File.WriteAllText(fileName, content.ToString());


Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Wed Aug 23, 2017 7:44 am

Hi,
I am trying that code but it is not extract code(like tesseract ocr) from pdf made from scanned images , it throw an error System.Drawing.Bitmap..ctor(Int32 width, Int32 height, PixelFormat format)

rahulkkush@gmail.com
 
Posts: 8
Joined: Tue Aug 22, 2017 2:55 pm

Wed Aug 23, 2017 8:15 am

Hello,

Thanks for your quick response.
To help us with a better investigation, could you please send your sample pdf file to us via email(support@e-iceblue.com)?

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Wed Aug 23, 2017 8:23 am

I send you a mail with attached pdf at 'support@e-iceblue.com' from emailID rahulk4@chetu.com .
Please check it

rahulkkush@gmail.com
 
Posts: 8
Joined: Tue Aug 22, 2017 2:55 pm

Wed Aug 23, 2017 8:55 am

Hello,

Thanks for your letter.
After a further test, we found the OCR feature is not available at present. I made the mistake using a special pdf file.
I apologize for that.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Wed Aug 23, 2017 9:10 am

For some page of that PDF it works fine but in some pages it gives me error ( Image[] images = page.ExtractImages();) because of this i am not able to compress a PDF File using image comression in spire.pdf.
Please resolve that issue.

rahulkkush@gmail.com
 
Posts: 8
Joined: Tue Aug 22, 2017 2:55 pm

Wed Aug 23, 2017 9:30 am

Hello,

Thanks for your quick response.
If you are simply wanting to extract images, that is not related to the OCR feature, and it can be acheived.
I have done a test on your document with the latest hotfix(Spire.PDF Pack(Hot Fix) Version:3.9.285 ), everything worked well.
Please try to this version first. If the issue still exists on your side, please write back and share more details on the running environment.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Wed Aug 23, 2017 9:37 am

But we need the OCR feature on scanned PDF, how we can achieve this? Please suggest.

rahulkkush@gmail.com
 
Posts: 8
Joined: Tue Aug 22, 2017 2:55 pm

Wed Aug 23, 2017 9:46 am

latest hotfix(Spire.PDF Pack(Hot Fix) Version:3.9.285 ) also gives same error.

rahulkkush@gmail.com
 
Posts: 8
Joined: Tue Aug 22, 2017 2:55 pm

Wed Aug 23, 2017 10:15 am

Hello,

Thanks for your response.
Our Spire.Pdf does not support the OCR feature and there's no good suggestion in this respect.
As for the exception thrown by "ExtractImage()", we will dig into it and reply you ASAP.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Wed Sep 20, 2017 8:02 am

Hi Rahul Kumar,

So sorry for the late reply.
In regards to the exception thrown by "ExtractImage()", I have reproduced it and logged it in our bug system. Once there's any update, I will inform you.

Sincerely,
Jane
E-iceblue support team
User avatar

Jane.Bai
 
Posts: 1156
Joined: Tue Nov 29, 2016 1:47 am

Fri Oct 06, 2017 1:25 pm

Team are we having OCR support now? we procured license for PDF doc and PDF net just now.

yogeshmsharma
 
Posts: 17
Joined: Mon Sep 25, 2017 5:25 pm

Mon Oct 09, 2017 2:42 am

Dear yogeshmsharma,

Sorry for late reply as weekend.
Sorry that Spire.PDF doesn't support OCR at present. But we have added this new feature into our system, we will let you know once there is any progress. In addition, I am afraid it cannot be implemented in a short time due to its complexity.

Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy.jiang
 
Posts: 3099
Joined: Tue Sep 06, 2016 8:30 am

Wed Oct 11, 2017 1:01 pm

Thanks for reply. Any tentative deadlines like next month or next quarter

yogeshmsharma
 
Posts: 17
Joined: Mon Sep 25, 2017 5:25 pm

Return to Spire.PDF