sarvan has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

As suggested i am trying to use CAM::PDF for extracting text from pdf and ppt documents. I installed CAM::PDF in my ubuntu sys and i run the following script.

!/usr/bin/perl use strict; use warnings; use CAM::PDF; use CAM::PDF::PageText; my $filename = shift || die "Supply pdf on command line\n"; my $pdf = CAM::PDF->new($filename); print text_from_page(1); sub text_from_page { my $pg_num = shift; return CAM::PDF::PageText->render($pdf->getPageContentTree($pg_num)); }

when i run this code with page no set to 1. it brings all the text from 1page. But when i change the page to 2nd. It says the following.

Failed to open filter FlateDecode (Text::PDF::FlateDecode) Unrecognized type in parseAny: 1 ڵZYs��~_� V���%& +#65533;����K�N��Q&#6 +5533;Jy��9$a...

Why is that occurs.. plz anyone let me know..

Replies are listed 'Best First'.
Re: regarding doubts in CAM::PDF
by Perlbotics (Archbishop) on Aug 04, 2011 at 09:10 UTC
Re: regarding doubts in CAM::PDF
by Anonymous Monk on Aug 04, 2011 at 09:03 UTC

    Why is that occurs.. plz anyone let me know..

    Because its Unrecognized type in parseAny:

    Its like saying "So sorry , i don't speak that foreign language"

Re: regarding doubts in CAM::PDF
by tmaly (Monk) on Aug 04, 2011 at 14:46 UTC

    I could be the version of the pdf that you are using. If I recall correctly, CAM::PDF only supports part of the 1.5 pdf spec.

    I ran into this problem and ended using a trick with open office running as a service to convert between pdf and other formats