extract arabic text from pdf, some arabic text is not correct.
origin word is "تقريباً" extracted word is "ﺗﻘﺮﻳﺒﺎً
"
api version : 9.5.6
- Code: Select all
PdfDocument pdf = new PdfDocument();
pdf.loadFromFile("out3.pdf");
PdfPageBase page = pdf.getPages().get(0);
//Create a PdfTextExtractor object
PdfTextExtractor textExtractor = new PdfTextExtractor(page);
//Create a PdfTextExtractOptions object
PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();
//Extract text from the page
String text = textExtractor.extract(extractOptions);