jai_dgl has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I'm trying to extract text data from the following file

http://www.bluecatnetworks.com/download_files/careers/QA_Software_Automation_Developer.pdf
( the file is stored locally )
The script as follows
use CAM::PDF; my $filename = 'QA_Software_Automation_Developer.pdf'; my $pdf = CAM::PDF->new($filename); my $pagenos = $pdf->numPages(); for (my $i=1;$i<=$pagenos;$i++) { $page_text .= ' '. $pdf->getPageText($i); } print "page :: $page_text\n";
I'm Getting the following error,
Can't call method "numPages" on an undefined value at readpdf.pl line
Is there any other CPAN module to parse this kind of PDFs
Thanks
Jey

Replies are listed 'Best First'.
Re: CAM::PDF Not extracting
by marto (Cardinal) on Sep 01, 2009 at 09:33 UTC

    You don't seem to be checking for errors when opening the PDF:

    my $pdf = CAM::PDF->new($filename) || die "$CAM::PDF::errstr\n";

    CAM::PDF comes with various scripts including getpdftext, try running that against your PDF.

    Martin

      my $pdf = CAM::PDF->new($filename) || die "$CAM::PDF::errstr\n";
      I tried the above syntax and got the error message as follows
      Incorrect password(s). The document cannot be decrypted.

      But this file does't seem to be password protected.
      Thanks
      Jey
Re: CAM::PDF Not extracting
by derby (Abbot) on Sep 01, 2009 at 13:02 UTC

    You can check out the pdfinfo.pl script that is part of CAM::PDF for how to handle passwords ('cuz obviously CAM::PDF thinks it's password protected). I prefer to shell out and use the pdinfo binary that's part of xpdf. xpdf has been more reliable for me than CAM::PDF (which, unless recently updated, does not handle the latest versions of PDF).

    -derby
Re: CAM::PDF Not extracting
by leocharre (Priest) on Sep 01, 2009 at 14:07 UTC