leocharre has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use PDF::API2 to turn a pdf document into many.
I am doing one doc out per page.
Thus, if the source is file.pdf, I want file_page1.pdf file_page2.pdf etc.

This is my test script..

#!/usr/bin/perl use strict; use PDF::API2; my $file_in = $ARGV[0]; my $prepend = $ARGV[1]; $prepend ||= 'out'; my $pdf_in = PDF::API2->open($file_in) or die; my $count = $pdf_in->pages; for my $i ( 0 .. ( $count - 1 )){ my $_i = ($i+1); my $file_out = "$file_in.$_i.pdf"; my $pdf_out = PDF::API2->new or die; $pdf_out->importpage( $pdf_in, $_i ) or die; $pdf_out->saveas($file_out) or die; print STDERR "saved $file_out\n"; } exit;

What's bugging me is that I get a PDF::API2 error..

leo@pony devel$ ./t/00bust.pl ./t/scan1.pdf 
Can't call method "outfilt" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/PDF/API2/Util.pm line 688.

Is this a possible bug, or am I doing this wrong?
I tried a few different pdf sources, same trouble. Here's one pdf source I tried: http://web.aces.uiuc.edu/vista/pdf_pubs/ICESTORM.PDF

I have the latest PDF::API2 from cpan.

Update

Failure happens on i386 machine.

I tested on x86_64 machine with PDF::API2 version 2.006, worked. I updated to version 2.015, tried again and it worked fine- still. Must be something else then- a dep.. ..

Replies are listed 'Best First'.
Re: PDF::API2 importpage problem
by waba (Monk) on Aug 05, 2008 at 18:54 UTC

    I can't seem to reproduce your problem with perl 5.10.0 in a 32 bits chroot on my x86_64 computer. I'm using PDF::API2 2.015 as well.

    Your code runs just fine (although it dies on line 23 because the savepage call never returns true according to the source). However, the resulting PDF files seem to be corrupted.

    Update: There is also CAM::PDF out there. If you're stuck with one, you could give a try to the other...

      Yes, I noticed that- saveas() does not return on success. PDF::API2 works well for this procedure on a few machines- but it fails on my i386. I tried CAM::PDF as you suggested.. I had to force install it via cpan. This is the example:
      #!/usr/bin/perl use strict; use CAM::PDF; my $file_in = $ARGV[0]; my $prepend = $ARGV[1]; $prepend ||= 'out'; my $pdf = CAM::PDF->new($file_in); my $count = $pdf->numPages; undef $pdf; for my $i ( 0 .. ( $count - 1 )){ my $_i = ($i+1); my $file_out = sprintf "%s_page_%04d.pdf", $file_in, $_i; # make sure it's not there unlink $file_out; my $pdf = CAM::PDF->new($file_in); $pdf->extractPages($_i); $pdf->cleansave; $pdf->output($file_out); print STDERR "saved $file_out\n"; } exit;

      It works. Slowly. You can see I am re-instancing a whole new pdf object for each one page I want. So, it the pdf has 110 pages, I have to re-instance 110 times, one for each page.

      I can't figure out another way right now with CAM::PDF.

      I really wish the documentation were a little bit better for these modules, PDF::API2 and CAM::PDF- and what's with the naming all the methods different? like numPages() and pages()- the interface is an inconsistency from hell- I don't care how much they worked on this- the messy api is unforgiveable.

        I know the feeling :/ Generally speaking, PDF::API2 is powerful but old and of poor quality, while CAM::PDF is clean but slow and lacks features. Specifically, CAM::PDF eagerly loads the whole file in memory before anything else, while more advanced libraries like PDF::API2 take a lazy and more efficient approach.

        The truth is, the PDF specification is 1300+ pages and growing. I don't think any of these packages is ever going to implement it fully and correctly, while still providing additional abstraction layers and tests (workload x2) and a comprehensive documentation (workload x3).

        PDF::API2 is on the right path as far as code structure goes, but has no tests (assume it's broken ;-) and its meager documentation requires that you know the PDF specs already. On the other hand, the CAM::API2 code is of much higher quality (tests, docs) while less advanced, and the library as a whole is begging for a redesign (106 methods in the one public class, yuck). Volunteers, anyone? :-)