heezy has asked for the wisdom of the Perl Monks concerning the following question:
Hi
I have searched around a lot on Perlmonks, CPAN and google for modules that will enable me to grab text out of a pdf document and then save it into a file
I found the following things
But none of these have answred my question! The two perl monks post don't actually have any resources but are just rants, flaming and peoples opinions. I thought the PDF::API2 module could solve my problems as it has a "stringify" method but this just returns the ASCII of the pdf in it's raw form! Still encoded and wierd!
I need to do this programatically as I need to extract the first 400 words of 4,500 pdf documents to create an abstract to describe the docs. If this were less docs I would copy and paste by opening each but there is no way I am doing this for 4,500 documents!
Thanks people
I hope someone can help!
M
(running on Solaris 9, SPARC etc..)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: pdf -> text
by crenz (Priest) on Mar 13, 2003 at 22:01 UTC | |
|
Re: pdf -> text
by traveler (Parson) on Mar 13, 2003 at 21:39 UTC |