Take a look at
http://search.cpan.org/~antro/PDF-111/examples/pagedump.pl. It's in the
PDF distribution. I've never used it, but it says it can parse "all possible data occuring in a PDF".
Some other options could be:
- PDF::Parse (though it doesn't look like it'll get your everywhere you want to go)
- pdf2text (there's a number of versions). You might have to convert it to parse it.
- The PDF format isn't that hard to parse. I mean, if PDF::API2 can build a PDF without very much convolution (outside of Unicode and fonts), one should be able to parse it relatively easily, I would think ...
------
We are the carpenters and bricklayers of the Information Age.
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.