Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Wed Jun 14, 2023 3:42 am

hi~
extract arabic text from pdf, some arabic text is not correct.
origin word is "تقريباً" extracted word is "ﺗﻘﺮﻳﺒﺎ‎ً
"

api version : 9.5.6
Code: Select all
PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("out3.pdf");

        PdfPageBase page = pdf.getPages().get(0);
        //Create a PdfTextExtractor object
        PdfTextExtractor textExtractor = new PdfTextExtractor(page);
        //Create a PdfTextExtractOptions object
        PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();
        //Extract text from the page
        String text = textExtractor.extract(extractOptions);


jinlongri
 
Posts: 1
Joined: Wed Jun 14, 2023 3:14 am

Wed Jun 14, 2023 6:41 am

Hi,

Thanks for your feedback.
After testing, I have reproduced the issue you mentioned and logged it into our issue tracking system with the ticket number SPIREPDF-6060, our developers will investigate and fix it. Sorry for the inconvenience caused, once the issue is fixed, I will inform you asap.

Best regards,
Triste
E-iceblue support team
User avatar

Triste.Dai
 
Posts: 1000
Joined: Tue Nov 15, 2022 3:59 am

Return to Spire.PDF