Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Wed Sep 17, 2025 4:01 am

Hi, I'm trying to extract text from several pdf files, some succeeded some not. Those failed extraction produced something like "\u0010\u0003\f\r\u000e\u0011\u0012\u0013\u0013\u0012\t\u0014\0", instead of regular text.

I have uploaded the pdf files and my code. As you can see when program runs to strPageText = textExtractor.ExtractText(extractOptions); the value of strPageText will be set to something similar to the above.

Please help me to identify the problem and advise what to do.

Thank you very much in advance!

Steven

P.S.: I'm using Spire.PDF 11.8.7 for .Net, my computer is Windows 11 Pro, 64 bit, Region setting is English (United States). My application is developed under .Net Framework 4.8 by Visual Studio 2022 in C#.

bosdasteven
 
Posts: 6
Joined: Mon Mar 03, 2025 3:11 am

Wed Sep 17, 2025 5:39 am

Hello,

Thank you for your inquiry.
We are unable to get your code and files, as the forum you are using may have file size limitations that could be preventing successful uploads. You could upload it to the DropBox or OneDrive and then share the download link with us, or you can try to send your code and files to my email address:[email protected]? Thank you in advance.
Sincerely,
Talia
E-iceblue support team
User avatar

talia.liu
 
Posts: 331
Joined: Mon Apr 14, 2025 3:33 am

Return to Spire.PDF