Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Fri Jan 06, 2023 10:30 am

Hi

I am using Spire.PDF 9.1.0 to extract text from a PDF and store in a text file.

However, the spacing is not being preserved, for example

The PDF has text
Capture.PNG
"Total to be collected by" which is extracted as "Total to becollectedby"

Any idea how to resolve this?

paul.oliver
 
Posts: 3
Joined: Fri Jan 06, 2023 10:17 am

Mon Jan 09, 2023 2:49 am

Hi,

Thanks for your inquiry.
I simulated a pdf file and extracted its text, but I did not reproduce the issue you mentioned, the spaces between the words were preserved, you can see the following code for reference.
Code: Select all
            //Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            //Load a PDF file
            doc.LoadFromFile("sample.pdf");

            //Get the first page
            PdfPageBase page = doc.Pages[0];

            //Create a PdfTextExtractot object
            PdfTextExtractor textExtractor = new PdfTextExtractor(page);

            //Create a PdfTextExtractOptions object
            PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();

            //Set isExtractAllText to true
            extractOptions.IsExtractAllText = true;

            //Extract text from the page
            string text = textExtractor.ExtractText(extractOptions);

            //Write to a txt file
            File.WriteAllText("Extracted.txt", text);

To help us do a further investigation, could you please provide us with the following messages? You can send them to us via email (support@e-iceblue.com) or attach them here. Thanks for your assistance in advance.
1) your input file and full code that can reproduce your issue.
2) your application type, such as ConsoleApp .NET Framework 4.8.
3) your test environment, such as OS info (E.g., Windows 10 64 bit) and region setting (E.g., China, Chinese).

Sincerely,
Triste
E-iceblue support team
User avatar

Triste.Dai
 
Posts: 1000
Joined: Tue Nov 15, 2022 3:59 am

Mon Jan 16, 2023 9:17 am

Good morning,

Thank you for your response, our code is matching your test code but we are still having the issue.

I have emailed example document to you, as it is confidential we are unable to share on the open forum

paul.oliver
 
Posts: 3
Joined: Fri Jan 06, 2023 10:17 am

Mon Jan 16, 2023 10:21 am

Hi,

Thanks for your feedback.
I tested your pdf document and reproduced the issue you mentioned, I have logged this issue into our bug tracking system with the ticket number SPIREPDF-5732. Our developers will investigate and fix it. Sorry for the inconvenience caused, once the issue is fixed, I will inform you asap.

Sincerely,
Triste
E-iceblue support team
User avatar

Triste.Dai
 
Posts: 1000
Joined: Tue Nov 15, 2022 3:59 am

Mon Feb 27, 2023 1:51 pm

Hi, appreciate you are busy, but is there any news on this?

paul.oliver
 
Posts: 3
Joined: Fri Jan 06, 2023 10:17 am

Tue Feb 28, 2023 2:42 am

Hi,

Thanks for your following-up.
I checked the status of your issue, the issue has not been solved yet, our developers are still working hard on it. I have urged them speed up fixing it, once there are any updates available, I will inform you asap. Thanks for your understanding.

Sincerely,
Triste
E-iceblue support team
User avatar

Triste.Dai
 
Posts: 1000
Joined: Tue Nov 15, 2022 3:59 am

Return to Spire.PDF

cron