Betsy.jiang wrote:Dear MakeSense,
Thanks for your detailed information.
Sorry that I misunderstood you before. For the order of the text, our product extracts text accroding to the order of document flow. The document you provided is that order. Sorry that there is no way to change that order.
About the invisible text issue, the overwriting method is just to keep white space. And we found in the file(egn201620507123.pdf) some text doesn't display as some reasons, but it is not the invisible text(the textrendermode value should be 3 for invisible text). This sort of text on your file will be extrated. Could you please provide us other sample file so that we can check the invisible text?
Thanks,
Betsy
E-iceblue support team
Dear Betsy,
First, I want to say, I really don't have much knowledge of PDF, I will say something that may be wrong or not professional. if that happens, I am sorry.
As you can see in the first PDF, there are fixed vertical splits and unfixed horizontal splits and couldn't be selected. these splits must be technology within PDF's scope. So I think, there must be
Reverse Engineering that could get rid of there splits and make the text to display the general way, then I could get the text the right order. so first, we should know what they are. I don't get much useful information from the Internet by basic searching...do you know what these splits are?
The invisible text(egn201620507123.pdf) may not the common invisible text(the text render mode value should be 3 for invisible text), I couldn't see them, so I called the invisible text. I deal with it by looping and filtering each page to get the information I want. Here is another
example, it's the Chinese version of this file. hope it will be help to Improve Spire.PDF to deal with this rare case.
Best Regards,
Kevin