I'm trying to get the coordinates (and bounds) of every word in a pdf. I found no easier way to do so other than searching for each unique word on every page with
- Code: Select all
PdfTextFindCollection words = page.FindText(word);
foreach (PdfTextFind find in words.Finds)
{
...
bound.X = find.Position.X;
bound.Y = find.Position.Y;
}
For some reason sometimes bound.Y is a very large negative number like -89463.002
(I haven't noticed bound.X having such values, they are usually seemingly normal, but I haven't checked all of them)
Why is that?
(The page dimensions are Width = 612.0 Height = 792.0)
My end goal here is to draw the page image into a picturebox and put red boxes around certain words.
Any help?
P.S. I really don't like this method of getting word locations as it also finds words that are part of sentences: e.g. if I try to find location of "may" it also finds "may" in "mayhem" and "maybe" etc. If anyone has a better method, please suggest.