arunmep has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: How toread the contents of PDF
by GrandFather (Saint) on Oct 18, 2006 at 09:15 UTC
Re: How toread the contents of PDF
by stvn (Monsignor) on Oct 18, 2006 at 12:06 UTC
      the link is not wrking

        The link is not working because it references the fixed CAM::PDF version 1.08. The current version is (at the time of writing) 1.60. It's not hard to find a correct link for that.

        I just tried to put a "permalink" to the latest version:

        http://search.cpan.org/dist/CAM-PDF/bin/getpdftext.pl links to the version of the script in the latest distribution.

Re: How toread the contents of PDF
by blazar (Canon) on Oct 18, 2006 at 09:34 UTC
    Hi guys, Iam developing a search engine can anybody tell me is there any way that I can read the contents of PDF file. Is there any package available that can read the entire content of PDf inside the perl script please let me know thank you.

    Probably, the best thing is to go to CPAN and search for pdf there. If you have difficulties locating one suitable package or are uncertain about the relative merits of some of them in case many are available, then you're welcome to ask for clarifications here.

      Probably, the best thing is to go to CPAN and search for pdf there.

      Unfortunately, nothing capable of extracting text from a PDF file (at least from what I can see from the POD docs) can be found until the 27th page of the CPAN search. Searching CPAN can sometimes be an enormous waste of time, and asking here can save a lot of time and hassle.

      UPDATE: I will acknowledge that PDF::API2 can stringify a PDF, but personal experience (something which is not found on search.cpan.org, and easily found here on perlmonks) has been that it does not work as expected all the time. There are also several other misleading modules like PDF::Parse and PDF::Extract. My main point being that sometimes it can save you a whole lot of time to just ask :)

      -stvn
Re: How toread the contents of PDF
by arunmep (Beadle) on Oct 20, 2006 at 06:19 UTC
    HI Monks, After an exhaustive search I reached a workaround and solved my problem. I got a pdf2text.exe from the google desktop software. Extracted the contents from the pdf files to a variable and searched the contents and deleted the file. Thanks for your replies