Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Wed Oct 28, 2020 12:38 am

I am trying to load in a pdf with editable form fields and interactive buttons and extract the text from its pages. However, the output seems to just be unreadable characters and I cannot find a solution. I have attached an image of the output, the pdf and have posted my code below. Any help would be appreciated, thanks in advance.

Code: Select all
public void LoadFromFile(string fileName)
{
    Dispose();
    _document = new PdfDocument(fileName);
}

public string Read()
{
    if (_document == null)
    {
        throw new InvalidOperationException();
    }

    _builder.Clear();

    for (int p = 0; p < _document.Pages.Count; p++)
    {
        string content = _document.Pages[p].ExtractText();
        _builder.Append(content);
        _builder.Append(" ");
    }

    return _builder.ToString();
}

tolgapasin
 
Posts: 1
Joined: Wed Oct 28, 2020 12:18 am

Wed Oct 28, 2020 2:04 am

Hello,

Thanks for your inquiry.
I did notice the issue you mentioned. However, I found even if you use Adobe to extract the text of your file, you will also get the unreadable characters, as shown below. And if you copy the text and then paste it, it will show unreadable characters as well. You can verify this on your end.
This issue is related to your file itself. Our Spire.PDF can’t handle such PDF files.

Sincerely,
Brian
E-iceblue support team
User avatar

Brian.Li
 
Posts: 1271
Joined: Mon Oct 19, 2020 3:04 am

Return to Spire.Doc

cron