in reply to Re: PDF::API2 importpage problem
in thread PDF::API2 importpage problem

Yes, I noticed that- saveas() does not return on success. PDF::API2 works well for this procedure on a few machines- but it fails on my i386. I tried CAM::PDF as you suggested.. I had to force install it via cpan. This is the example:
#!/usr/bin/perl use strict; use CAM::PDF; my $file_in = $ARGV[0]; my $prepend = $ARGV[1]; $prepend ||= 'out'; my $pdf = CAM::PDF->new($file_in); my $count = $pdf->numPages; undef $pdf; for my $i ( 0 .. ( $count - 1 )){ my $_i = ($i+1); my $file_out = sprintf "%s_page_%04d.pdf", $file_in, $_i; # make sure it's not there unlink $file_out; my $pdf = CAM::PDF->new($file_in); $pdf->extractPages($_i); $pdf->cleansave; $pdf->output($file_out); print STDERR "saved $file_out\n"; } exit;

It works. Slowly. You can see I am re-instancing a whole new pdf object for each one page I want. So, it the pdf has 110 pages, I have to re-instance 110 times, one for each page.

I can't figure out another way right now with CAM::PDF.

I really wish the documentation were a little bit better for these modules, PDF::API2 and CAM::PDF- and what's with the naming all the methods different? like numPages() and pages()- the interface is an inconsistency from hell- I don't care how much they worked on this- the messy api is unforgiveable.

Replies are listed 'Best First'.
Re^3: PDF::API2 importpage problem
by waba (Monk) on Aug 07, 2008 at 18:25 UTC

    I know the feeling :/ Generally speaking, PDF::API2 is powerful but old and of poor quality, while CAM::PDF is clean but slow and lacks features. Specifically, CAM::PDF eagerly loads the whole file in memory before anything else, while more advanced libraries like PDF::API2 take a lazy and more efficient approach.

    The truth is, the PDF specification is 1300+ pages and growing. I don't think any of these packages is ever going to implement it fully and correctly, while still providing additional abstraction layers and tests (workload x2) and a comprehensive documentation (workload x3).

    PDF::API2 is on the right path as far as code structure goes, but has no tests (assume it's broken ;-) and its meager documentation requires that you know the PDF specs already. On the other hand, the CAM::API2 code is of much higher quality (tests, docs) while less advanced, and the library as a whole is begging for a redesign (106 methods in the one public class, yuck). Volunteers, anyone? :-)