Mr. Muskrat has asked for the wisdom of the Perl Monks concerning the following question:
My Fellow Monks,
After using google and Super Search and turning up nothing of value, I started looking through all of the documentation for Text::PDF and PDF::API2. Try as I might, I can not find a way to use Perl to extract the words from a PDF document as plain text or html. Heck, all I see is how to make new PDF files or change existing ones...
I did however find pdf2html. It will be a breeze to run it and then extract the data from the html that it produces.
<whine>But! I don't want to...<\whine>
I know that many of you have worked with PDF files (otherwise there would not be so many hits when doing a Super Search). <begging>Please give me guidance!<\begging>
Can I indeed use one of the modules mentioned to do this?
<hounding>Can I? Can I? Huh? Huh?
Pretty please with sugar on top?
I promise I'll be good (at least until after Christmas).<\hounding>
:^)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracting the data from a PDF
by Ovid (Cardinal) on Aug 29, 2002 at 21:55 UTC | |
|
Re: (nrd) Extracting the data from a PDF
by newrisedesigns (Curate) on Aug 29, 2002 at 21:54 UTC | |
|
Re: Extracting the data from a PDF
by Mr. Muskrat (Canon) on Aug 29, 2002 at 22:07 UTC | |
|
Re: Extracting the data from a PDF
by Mr. Muskrat (Canon) on Aug 30, 2002 at 22:14 UTC |