Extracted Garbage Text

Wed Sep 17, 2025 4:01 am

Hi, I'm trying to extract text from several pdf files, some succeeded some not. Those failed extraction produced something like "\u0010\u0003\f\r\u000e\u0011\u0012\u0013\u0013\u0012\t\u0014\0", instead of regular text.

I have uploaded the pdf files and my code. As you can see when program runs to strPageText = textExtractor.ExtractText(extractOptions); the value of strPageText will be set to something similar to the above.

Please help me to identify the problem and advise what to do.

Thank you very much in advance!

Steven

P.S.: I'm using Spire.PDF 11.8.7 for .Net, my computer is Windows 11 Pro, 64 bit, Region setting is English (United States). My application is developed under .Net Framework 4.8 by Visual Studio 2022 in C#.

Wed Sep 17, 2025 5:39 am

Hello,

Thank you for your inquiry.
We are unable to get your code and files, as the forum you are using may have file size limitations that could be preventing successful uploads. You could upload it to the DropBox or OneDrive and then share the download link with us, or you can try to send your code and files to my email address:[email protected]? Thank you in advance.