Melvin has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to write up a search engine for pdf documents, and need to read in the text data of various pdf files.

I've taken a look at the various PDF modules on CPAN, but, unless I'm missing something, they don't seem to allow text to be read in from the files, they just do simple processing of document data, or are concerned with editing the document.

Has anyone used these packages that can point me in the right direction?

Replies are listed 'Best First'.
Re: Reading PDF Files?
by t0mas (Priest) on Jun 07, 2000 at 21:52 UTC
    If you can't find a module that does what you want - try writing it yourself. Maybe lots of people would like this kind of module...
    You find the pdf specs, among lots of other specs at The Programmer's File Format Collection.
    (And I do think Monks should help each other hack!)

    /brother t0mas
Re: Reading PDF Files?
by Melvin (Beadle) on Jun 13, 2000 at 11:23 UTC
    Well, for what its worth, I did manage to find a package that would convert pdf files to text, which makes indexing possible, yet it would be much better (and cooler) to be able to suck the text out of the pdf's with our favorite language.

    Maybe I'll get ambitious and figure out the structure and do it by hand, but I'm not sure I'm ready (perl-wise) to tackle that kind of package. Laziness, hubris ...

      Melvin,

      Good news that you found the package that you needed, but how about clueing the rest of us in to the package that you found. That way others (including myself) can benefit from this thread.
A reply falls below the community's threshold of quality. You may see it by logging in.