lennelei has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I've a very small Perl script that splits PDF files bigger than 100 pages :

use strict; use warnings; use PDF::API2; #...some code here... #for testing purpose: my $file='path_to_some.pdf'; # my $oldpdf = PDF::API2->open($file); if ($oldpdf->pages > 100) { my $newpdf = PDF::API2->new; printf " (%d pages)\n", $oldpdf->pages; for my $page_nb (1..10) { $newpdf->importpage($oldpdf, $page_nb); } $newpdf->saveas("_$file"); }

I'm running this on Windows (Windows 7 for my desktop, Windows 2008/2012 for the servers) with a Strawberry Perl 5.14 and PDF::API2 module installed using cpan.bat

It's working and used for weeks now without trouble until this week. With a pdf received a few days ago, the script output is a 100 blank pages document.

I tried using the alternative with importPageIntoForm by snoopy from http://www.perlmonks.org/?node_id=615492 with the same result.

I also tried another tool (sejda) and the pages are correctly extracted so it's probably an issue with PDF::API2 or a misconfiguration but I don't know what to add/change in the script.

FYI, the sejda command line:

sejda-console.bat extractpages -f SOURCE.PDF -o TARGET.PDF -s 1-100

Any idea/alternative I could try? I'd like to keep the Perl as this is only a small part of a bigger script, but if I have no other option, I'll use sejda for the split.

Unfortunately, I cannot provide the PDF :(

Thank you

Edit : I just found and tried with CAM::PDF using the following code and it's working!

For what I've seen, the difference between both code is that PDF::API2->import_page function tries to copy the content of the pages where CAM::PDF->extractPages function removes the pages outside the given range. Maybe there is a similar method in PDF::API2 but I couldn't find it yet?

use strict; use warnings; use CAM::PDF; #...some code here... #for testing purpose: my $file='path_to_some.pdf'; # my $oldpdf = CAM::PDF->new($file) or die "$CAM::PDF::errstr\n"; if ($oldpdf->numPages() > 100) { printf " (%d pages)\n", $oldpdf->numPages(); $oldpdf->extractPages(1..100); $oldpdf->cleanoutput("_$file"); }

Replies are listed 'Best First'.
Re: blank pdf generated using PDF::API2
by thanos1983 (Parson) on Jul 21, 2017 at 08:16 UTC

    Hello lennelei

    Welcome to the mnonastery. Try to use PDF::API2/PAGE METHODS/import_page(), it should work I tested on mine. Sample of working code:

    Update: Minor note there is no importpage() method you probably mean import_page() which is working as expected, see sample code bellow.

    #!/usr/bin/perl use strict; use warnings; use PDF::API2; my $file='test.pdf'; my $newpdf = PDF::API2->new(); my $oldpdf = PDF::API2->open($file); if ($oldpdf->pages() > 1) { printf " (%d pages)\n", $oldpdf->numPages(); for my $page_nb (1..8) { $newpdf->import_page($oldpdf, $page_nb, $page_nb); } $newpdf->saveas("test_2.pdf"); }

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Changes at Revison 2.022

      2.022     2014-07-04
      
          - Added $pdf->version() get/set method.  When opening an existing PDF, the
            existing version number will now be retained.
      
          - Renamed the following in PDF::API2:
              - importpage to import_page
              - openScalar to open_scalar
      
      poj

        Thanks poj I had no clue... :D

        Seeking for Perl wisdom...on the process of learning...not there...yet!

      Thank you for your help and your welcome :)

      You're right for the method name, I don't know why I use importpage (I probably found it on an old example) ; however, both methods return the same result so it doesn't work either for my specific pdf.

      As I (tried to) explain, my script is working: we use it in a production environment for weeks now. There is only one file it failed to process correctly and for which the output is a 100 blank pages pdf.

        Hello again lennelei,

        Notice this line($newpdf->import_page($oldpdf, $page_nb, $page_nb);) it uses 3 parameters not 2. Try to copy the example that I provided you and test it. Does it work?I simulate the scenario with a pdf that I have 7 pages and seems to be working just fine.

        I do not have a pdf 100 pages so I can not really check it but give it a try I assume it should work.

        Update: I just tested the sample of code that I provided you with a pdf of 123 pages. It works just fine. The only line that I modified is the for loop (for my $page_nb (1..$oldpdf->pages())).

        Update2: Full sample of executable code bellow:

        #!/usr/bin/perl use strict; use warnings; use PDF::API2; use feature 'say'; my $file='test.pdf'; my $newpdf = PDF::API2->new(); my $oldpdf = PDF::API2->open($file); if ($oldpdf->pages() > 1) { say $oldpdf->pages() . ' pages.'; for my $page_nb (1..$oldpdf->pages()) { $newpdf->import_page($oldpdf, $page_nb, $page_nb); } $newpdf->saveas("test_2.pdf"); }

        Let us know if it works, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!