Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Sat Feb 01, 2025 4:57 am

I am using Spire.Pdf for .NET 10.2.2.0 to extract text from tables in a PDF file.
The problem is that it has some kind of background layer's text data as below:
So, when I extract text from cell, it also returns these background layer's text "열", "람" and "용".
How should I change my code?


PdfTableExtractor _tableExtractor = new PdfTableExtractor(doc);
PdfTable[] _tableList = null;
string _title = string.Empty;

for (int i = 0; i < doc.Pages.Count; i++)
{
// extract from table
_tableList = _tableExtractor.ExtractTable(i);
int _tableNumber = 0;

if (_tableList != null && _tableList.Length > 0)
{
foreach (PdfTable _table in _tableList)
{
int _row = _table.GetRowCount();
int _column = _table.GetColumnCount();

for (int j = 0; j < _row; j++)
{
for (int k = 0; k < _column; k++)
{
string _text = _table.GetText(j, k);
}
}
}
}
}

tilkoblet
 
Posts: 1
Joined: Thu Jan 23, 2025 9:39 am

Tue Feb 04, 2025 8:01 am

Hello,

Thanks for your inquiry.
Apologies for the delayed response as we are on the Spring Festival holiday from January 28th to February 4th. To assist us in further investigation, please provide us with your input file. Thank you in advance.

Sincerely,
William
E-iceblue support team
User avatar

William.Zhang
 
Posts: 732
Joined: Mon Dec 27, 2021 2:23 am

Return to Spire.PDF