as a side note, there is a PDF::API3 available on CPAN.
I have used CAM::PDF mainly for the tasks of extracting text. However, I have had little luck with embedded html in pdfs. You may be able to walk the root dictionary of the pdf using CAM::PDF and store information you need. There is also a module CAM::PDF::Renderer::Text that may be of some help
In reply to Re^3: Extracting text from a PDF (using PDF::API2)
by tmaly
in thread Extracting text from a PDF (using PDF::API2)
by music_man1352000
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |