in reply to read pdf text in hidden layer?

You should take a look at CAM::PDF. I think you will find it useful.

Thanks

Martin

Replies are listed 'Best First'.
Re^2: read pdf text in hidden layer?
by leocharre (Priest) on May 07, 2007 at 15:20 UTC

    I can't even instanciate an object...

    my $abs = '/var/doc/Towson/AP/IA/1 - VERIZON -@APIA.pdf'; my $pdf = CAM::PDF->new($abs) or die("CAM PDF returns nothing");

    What's up with this module? no errors.. no warnings.. nothing??? And the documentation is a wreck. Looked so promissing. It's really sad- a lot of work went into CAM::PDF, I hope they revisit the pod.

      Really? Seems to work ok for me. Quick test:
      #!/usr/bin/perl use strict; use warnings; use CAM::PDF; my $input='E:\vecguid.pdf'; my $output='E:\Test.pdf'; my $pdf = CAM::PDF->new($input) or die "$CAM::PDF::errstr\n"; $pdf->output($output);

      Did you try any of the examples scripts?

      Martin

        Wow thanks, this made a big difference. I didn't see these examples mentioned in the CAM::PDF doc- These are really useful utilities.

        I still hold to what I said earlier. I think docs/errors need more 'basic stuff'.

        I have a pdf I open and I get this error 'Expected stream open tag' . It's a die(), with no tangible info- for example, what method died? What module? ( It's from CAM::PDF::parseStream(), I had to find xargs grep for it- that's fine- I can do that- but not everyone should have to. ) - Some Carp::confess() would be nice here.

        pdftotext from xpdf works fine with that same file CAM::PDF choaks on. Dunno why. The PDF may be corrupt.

        Thank you for pointing these out!

        update

        It seems CAM::PDF::parseStream() expects a pdf stream tag to be followed by a newline.. Some of these files going between NTFS/ext3 seem to have funnied up with the newline .. (you know the old story with ftp binmode.. )- so maybe xpdf allows for ^M to be a newline, but not CAM::PDF? Maybe I'll write author.