As suggested i am trying to use CAM::PDF for extracting text from pdf and ppt documents. I installed CAM::PDF in my ubuntu sys and i run the following script.
!/usr/bin/perl use strict; use warnings; use CAM::PDF; use CAM::PDF::PageText; my $filename = shift || die "Supply pdf on command line\n"; my $pdf = CAM::PDF->new($filename); print text_from_page(1); sub text_from_page { my $pg_num = shift; return CAM::PDF::PageText->render($pdf->getPageContentTree($pg_num)); }
when i run this code with page no set to 1. it brings all the text from 1page. But when i change the page to 2nd. It says the following.
Failed to open filter FlateDecode (Text::PDF::FlateDecode) Unrecognized type in parseAny: 1 ڵZYs��~_� V���%& +#65533;����K�N��Q +5533;Jy��9$a...
Why is that occurs.. plz anyone let me know..
In reply to regarding doubts in CAM::PDF by sarvan
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |