Hello Monks,
I would like to parse a rather simple, but large pdf file.
I can copy and paste the content page wise, thus it does not contain images for the text.
I looked at the PDF-API2 documentation and found it very unhandy. How would you approach to parse the text content a pdf document? Do you any hints I should look at? I found a lot to create, but nothing to parse PDF.
Thanks!
Update:
I want to stress out that no images are involved and I can use Window's copy and paste function.
For the moment I have implemented an autoIt solution which creates a text file based on around 4000 copy and pastes. I would like to have a clean solution for the future.