user123 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am a new member here and looking to get some good help from the wise monks. Does anyone know of a perl module using which I can search for a string within a PDF file? I would like to search for a string, for e.g. "Perl is cool", in a PDF file. If the string is present in the PDF then it should return true else false. Thanks, Sagar

Replies are listed 'Best First'.
Re: Search within a PDF file
by friedo (Prior) on Jan 27, 2005 at 20:16 UTC
      You'll note that those two modules are linked from CPAN.org, the Comprehensive Perl Archive Network.

      Are you familiar with CPAN? Its search is very useful when searching for modules and documentation (another good source is Activestate).

      Ardemus - "This, right now, is our lives."
      I haven't tried them yet but PDF and PDF::Parse APIs do not have the capability to search within a PDF file. The documentation shows that only the PDF doc properties can be retrived. I plan to try it out tonite anyways. Thanks for your help. -Sagar
Re: Search within a PDF file
by dragonchild (Archbishop) on Jan 27, 2005 at 20:27 UTC
    Have you tried grep? Strings, if I remember correctly, are stored as plain-text within the PDF format ...

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      Much of what I see in PDF files is enclosed in 'stream' blocks, which appears to be a compression encoding. grep won't do it. When I am forced to do this myself, I sure hope one of the above mentioned modules or other will take care of pulling out the text I need to look at. (Oh, I'm not looking forward to this!)
Re: Search within a PDF file
by neilwatson (Priest) on Mar 08, 2005 at 19:55 UTC