FreeSpire.PDF

Fri Jan 03, 2025 9:31 am

Is there any way I can search for a text in the PDF by removing the special characters and spaces in the text to retrieve the coordinates of the text?

Fri Jan 03, 2025 10:04 am

Hello,

Thank you for your inquiry.

I recommend that you try using a regular expression to search through the text. Here is an example code snippet:

Code: Select all: PdfDocument doc = new PdfDocument(); // Read a pdf file doc.LoadFromFile(input); // Get the first page of pdf file PdfPageBase page = doc.Pages[0]; // Create a PdfTextFinder object for searching text within the first page PdfTextFinder finder = new PdfTextFinder(page); finder.Options.Parameter = Spire.Pdf.Texts.TextFindParameter.Regex; // Find occurrences of the specified text within the first page List<PdfTextFragment> finds = finder.Find("hello[\\s\\S]*world"); // Creates a brush PdfBrush brush = new PdfSolidBrush(Color.DarkBlue); // Defines a font PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("Arial", 7f, FontStyle.Bold)); // Defines text horizontal/vertical center format PdfStringFormat centerAlign = new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle); RectangleF rec; // Iterate through each found text fragment foreach (PdfTextFragment find in finds) { // Gets the bound of the found text in page rec = find.Bounds[0]; float x = rec.X; float y = rec.Y; }

If you need further assistance or have more questions, feel free to ask!

Sincerely,
Amy
E-iceblue support team

Tue Jan 07, 2025 5:34 am

Hi Amy,

I have the SearchTerm in a json file and need to search for that term in the PDF file by ignoring the spaces and special characters to find the coordinate of the text in the PDF. I have options to remove the special characters and spaces from the SearchTerm but is it possible to remove them from the pdf text also for matching them to retrieve the coordinates of the text from the PDF. Can you let me know how this can be done? I have given an example of a text below which needs to be searched in the PDF

Search Term in json file: BMI : 31.0 - 39.0 , adult
PDF Text: BMI:31.0-39.0, adult

I am currently using the below query to search for a text in the PDF

PdfTextFinder finder = new PdfTextFinder(page);
PdfTextFindOptions options = new PdfTextFindOptions();
options.Parameter = Spire.Pdf.Texts.TextFindParameter.IgnoreCase;
finder.Options = options;
List<PdfTextFragment> fragments = finder.Find(SearchKeyTerm);

Tue Jan 07, 2025 7:50 am

Hello,

Thank you for your feedback.

Based on your requirements, we have prepared the following demo for your reference. Please note that it is not possible to remove the original text from the PDF content directly, instead, a white rectangular area has been used to cover the target text, achieving a visual effect of making it invisible.

Code: Select all: PdfDocument doc = new PdfDocument(); // Read a pdf file doc.LoadFromFile(path+"1.pdf"); // Get the first page of pdf file PdfPageBase page = doc.Pages[0]; // Create a PdfTextFinder object for searching text within the first page PdfTextFinder finder = new PdfTextFinder(page); finder.Options.Parameter = Spire.Pdf.Texts.TextFindParameter.Regex; String regex = "BMI\\s*:\\s*\\d+(?:\\.\\d+)?\\s*-\\s*\\d+(?:\\.\\d+)?\\s*,\\s*adult"; // Find occurrences of the specified text within the first page List<PdfTextFragment> finds = finder.Find(regex); // Creates a brush PdfBrush brush = new PdfSolidBrush(Color.DarkBlue); // Defines a font PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("Arial", 7f, FontStyle.Bold)); // Defines text horizontal/vertical center format PdfStringFormat centerAlign = new PdfStringFormat(PdfTextAlignment.Center, PdfVerticalAlignment.Middle); RectangleF rec; // Iterate through each found text fragment foreach (PdfTextFragment find in finds) { // Gets the bound of the found text in page rec = find.Bounds[0]; page.Canvas.DrawRectangle(PdfBrushes.White, rec); // Draws new text as defined font and color page.Canvas.DrawString("", font, brush, rec); } doc.SaveToFile(path+"result.pdf"); doc.Close(); doc.Dispose();

If the above example does not meet your needs, please provide us with further feedback.

Sincerely,
Amy
E-iceblue support team

Mon Feb 17, 2025 10:47 am

Hi. While extracting the text using the below code from the attached section of the PDFs marked in red color using FeeSpire.PDF it is unable to extract the highlighted text. Please advise.

PdfTextExtractor textExtractor = new PdfTextExtractor(page);

//Create a PdfTextExtractOptions object
PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();

//Set isExtractAllText to true
//extractOptions.IsSimpleExtraction = true;

extractOptions.IsExtractAllText = true;

//Extract text from the page
string pdftextcontent = textExtractor.ExtractText(extractOptions);

Tue Feb 18, 2025 2:28 am

Hi,

Thanks for your inquiry.
Please provide your input PDF file to help me investigate your issue accurately and quickly. You can upload your file here or send it to my email ([email protected]).

Sincerely,
Nina
E-iceblue support team