Hi,
Thanks for your inquiry.
I simulated a pdf file and extracted its text, but I did not reproduce the issue you mentioned, the spaces between the words were preserved, you can see the following code for reference.
- Code: Select all
//Create a PdfDocument object
PdfDocument doc = new PdfDocument();
//Load a PDF file
doc.LoadFromFile("sample.pdf");
//Get the first page
PdfPageBase page = doc.Pages[0];
//Create a PdfTextExtractot object
PdfTextExtractor textExtractor = new PdfTextExtractor(page);
//Create a PdfTextExtractOptions object
PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();
//Set isExtractAllText to true
extractOptions.IsExtractAllText = true;
//Extract text from the page
string text = textExtractor.ExtractText(extractOptions);
//Write to a txt file
File.WriteAllText("Extracted.txt", text);
To help us do a further investigation, could you please provide us with the following messages? You can send them to us via email (
support@e-iceblue.com) or attach them here. Thanks for your assistance in advance.
1) your input file and full code that can reproduce your issue.
2) your application type, such as ConsoleApp .NET Framework 4.8.
3) your test environment, such as OS info (E.g., Windows 10 64 bit) and region setting (E.g., China, Chinese).
Sincerely,
Triste
E-iceblue support team