My Fellow Monks,
After using google and Super Search and turning up nothing of value, I started looking through all of the documentation for Text::PDF and PDF::API2. Try as I might, I can not find a way to use Perl to extract the words from a PDF document as plain text or html. Heck, all I see is how to make new PDF files or change existing ones...
I did however find pdf2html. It will be a breeze to run it and then extract the data from the html that it produces.
<whine>But! I don't want to...<\whine>
I know that many of you have worked with PDF files (otherwise there would not be so many hits when doing a Super Search). <begging>Please give me guidance!<\begging>
Can I indeed use one of the modules mentioned to do this?
<hounding>Can I? Can I? Huh? Huh?
Pretty please with sugar on top?
I promise I'll be good (at least until after Christmas).<\hounding>
:^)
In reply to Extracting the data from a PDF by Mr. Muskrat
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |