Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.

Wed Jun 12, 2024 1:59 pm

At the moment, we are defining the page number to fetch all of the pdfFormWidgets.
In the document, we have two textboxes (one per page).

When we define the document page and fetch the widget for the first page,
the list comes back with all of the fields.
We are looking to only get one when we define the logic to fetch the first page.

Sample:
- Spire.PDF version 10.6.0 (.NET)

// Iterate through each page of the document
for (int pageNumber = 0; pageNumber < document.Pages.Count; pageNumber++)
{
// Get the page
var page = document.Pages[pageNumber];

// Get the widget from the page
PdfFormWidget widget = page.Document.Form as PdfFormWidget;

// Go through the list of widgets
for (int i = 0; i < widget.FieldsWidget.List.Count; i++)
{
var field = widget.FieldsWidget.List[i] as PdfField;
var fieldObj = new Annotation_Spire
{
Name = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Name,
Page = pageNumber,
X = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.X,
Y = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.Y,
Width = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.Width,
Height = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.Height,

};

formFields.Add(fieldObj);

}
}

cnavarro1109
 
Posts: 2
Joined: Wed Jun 12, 2024 1:55 pm

Thu Jun 13, 2024 2:34 am

Hello,

Thanks for your inquiry.
Kindly note that "page.Document.Form as PdfFormWidget" returns the fields of the entire document, not the current page. We have provided a method to return the page number of each form field, which can help you achieve filtering effects. I have attached the modified code for your reference. If you have any questions, please feel free to write back.
Code: Select all
List<Annotation_Spire> formFields = new List<Annotation_Spire>();

// Iterate through each page of the document
for (int pageNumber = 0; pageNumber < document.Pages.Count; pageNumber++)
{
    PdfFormWidget widget = document.Form as PdfFormWidget;

    // Go through the list of widgets
    for (int i = 0; i < widget.FieldsWidget.List.Count; i++)
    {
        var field = widget.FieldsWidget.List[i] as PdfField;
        var fieldObj = new Annotation_Spire
        {
            Name = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Name,
            Page = pageNumber,
            X = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.X,
            Y = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.Y,
            Width = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.Width,
            Height = ((Spire.Pdf.Widget.PdfStyledFieldWidget)field).Bounds.Height,

        };
        if (document.Pages.IndexOf(field.Page) == pageNumber)
        {
            formFields.Add(fieldObj);
        }   

    }
}

Sincerely,
William
E-iceblue support team
User avatar

William.Zhang
 
Posts: 732
Joined: Mon Dec 27, 2021 2:23 am

Thu Jun 13, 2024 5:38 am

Thank you.
With that said, I'm having issues reading the values from certain pdf files.

I'm able to manually define/add a textbox through ADOBE PDF.
Then I save the coordinates in a JSON file as such:

[
{
"Page": 0,
"Name": "notice_market_value",
"X": 412.799,
"Y": 320.29,
"Height": 22.0,
"Width": 150.0
},
{
"Page": 0,
"Name": "scheduling_hearing_date",
"X": 357.817,
"Y": 396.217,
"Height": 22.0,
"Width": 206.29099
}
]


The issue is reading the values from a file that does not contain the textbox.
These files are generated outside of our environment but use the same template.

Would like to read the values.
This is what I have so far but get a blank/empty value.

public string ExtractTextFromPdf(FileStream reader, Annotation_Spire annotation)
{
try
{

using (Spire.Pdf.PdfDocument doc = new Spire.Pdf.PdfDocument())
{
doc.LoadFromStream(reader);

if (annotation.Page >= 0 && annotation.Page < doc.Pages.Count)
{
PdfPageBase currentPage = doc.Pages[annotation.Page];

Spire.Pdf.Texts.PdfTextExtractor textExtractor =
new Spire.Pdf.Texts.PdfTextExtractor(currentPage);

PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();
extractOptions.ExtractArea = new RectangleF(annotation.X, annotation.Y, annotation.Width, annotation.Height);
string text = textExtractor.ExtractText(extractOptions);
return text.Trim();

}
else
{
throw new ArgumentException("Invalid page index provided.");
}
}
}
catch (Exception ex)
{
// Handle exceptions here
Console.WriteLine("An error occurred: " + ex.Message);
return string.Empty; // Or throw an exception further
}
}

With that, I need to define which page the field will be read from.

cnavarro1109
 
Posts: 2
Joined: Wed Jun 12, 2024 1:55 pm

Thu Jun 13, 2024 7:20 am

Hello,

Thanks for your inquiry.
Do you want to get the position information of the text box first and then use PdfTextExtractor to extract the text at that position? Currently, PdfTextExtractor does not support extracting content from form field. However, the text content in the textbox can be directly obtained through "string text = textBoxField.Text;". Maybe the following code can be used for your reference. If there is any misunderstanding, please reply to tell us more details about your problem, and if possible, please also provide us with your test document. Thanks in advance.
Code: Select all
for (int pageNumber = 0; pageNumber < document.Pages.Count; pageNumber++)
{
    PdfFormWidget widget = document.Form as PdfFormWidget;

    // Go through the list of widgets
    for (int i = 0; i < widget.FieldsWidget.List.Count; i++)
    {
        var field = widget.FieldsWidget.List[i] as PdfField;

        PdfTextBoxFieldWidget textBoxField = field as PdfTextBoxFieldWidget;
        var fieldObj = new Annotation_Spire
        {
            Name = textBoxField.Name,
            Page = pageNumber,
            X = textBoxField.Bounds.X,
            Y = textBoxField.Bounds.Y,
            Width = textBoxField.Bounds.Width,
            Height = textBoxField.Bounds.Height,
            Text = textBoxField.Text,

        };
        if (document.Pages.IndexOf(field.Page) == pageNumber)
        {
            formFields.Add(fieldObj);
        }
    }
}

Sincerely,
William
E-iceblue support team
User avatar

William.Zhang
 
Posts: 732
Joined: Mon Dec 27, 2021 2:23 am

Return to Spire.PDF