in reply to How toread the contents of PDF

Hi guys, Iam developing a search engine can anybody tell me is there any way that I can read the contents of PDF file. Is there any package available that can read the entire content of PDf inside the perl script please let me know thank you.

Probably, the best thing is to go to CPAN and search for pdf there. If you have difficulties locating one suitable package or are uncertain about the relative merits of some of them in case many are available, then you're welcome to ask for clarifications here.

Replies are listed 'Best First'.
Re^2: How toread the contents of PDF
by stvn (Monsignor) on Oct 18, 2006 at 12:16 UTC
    Probably, the best thing is to go to CPAN and search for pdf there.

    Unfortunately, nothing capable of extracting text from a PDF file (at least from what I can see from the POD docs) can be found until the 27th page of the CPAN search. Searching CPAN can sometimes be an enormous waste of time, and asking here can save a lot of time and hassle.

    UPDATE: I will acknowledge that PDF::API2 can stringify a PDF, but personal experience (something which is not found on search.cpan.org, and easily found here on perlmonks) has been that it does not work as expected all the time. There are also several other misleading modules like PDF::Parse and PDF::Extract. My main point being that sometimes it can save you a whole lot of time to just ask :)

    -stvn

        CAM::PDF was the module I was referring to on the 27th page. I also recomended it in my response to the OP below.

        -stvn