in reply to Re^3: blank pdf generated using PDF::API2 (Updated)
in thread blank pdf generated using PDF::API2

Using 3 parameters produces exactly the same result with my pdf (I use your script without the printf " (%d pages)\n", $oldpdf->numPages(); line which is for CAM::PDF). I'd like to provide the pdf file so that you can try but I have to remove sensible information in it before ; and I don't really know how to do this for the moment :)

Thank you again!

Replies are listed 'Best First'.
Re^5: blank pdf generated using PDF::API2 (Updated)
by AnomalousMonk (Archbishop) on Jul 21, 2017 at 10:53 UTC
    I'd like to provide the pdf file ... I don't really know how ...

    How about going the other way? What happens when you run your code (or, indeed, the other monks' code) against some 100+ page document they seem to be having success with, e.g., Modern Perl?


    Give a man a fish:  <%-{-{-{-<

      First of all, I'm sorry if everything is not clear: English is not my native language.

      If you want the full story, here it is (you can skip that part :). We (my company) have a specific OCR software to handle PDF bills. I made a Perl script that extract pdf files from emails or retrieve them from our MFT software, rename them for normalization. As I learned that our OCR software doesn't work well with big pdf, I tried to add some code to the script to check if a file is more than 100 pages and, in that case, keeps only the first 100 pages (and drop the rest).

      As I didn't want to bother you with all the details, I only keep the part that cut the pdf in my first post.

      To resume: for any pdf, I need to keep at most the 100 first pages (if the pdf is 15 pages, I leave it untouched ; if it's 654 pages, I create a new pdf with the pages 1 to 100 included).

      ---- End of the story ----

      Once again, my script is working (99.9% of the time): my problem is not how to write it but why did it fails for one (only one) pdf and what can I do (if I can do something)!

      I didn't try the script against the "Modern Perl" file because unfortunately, I don't have it (yet), but I have lot of 100+ pages pdf (up to 600 pages) and they are all (but one) correctly processed by my script.

      I would like to provide you this specific pdf which has probably something that prevents PDF::API2 to process it correctly but I cannot as it contains customers information (I'm looking for a way to obfuscate the content).

      What's strange is that I managed to extract 100 pages from this specific pdf using sejda or CAM::PDF and the extractPages method.

      But with PDF::API2, it's not working.

        Try this program against your problem pdf. What version of PDF::API2 do you have ?

        #!/usr/bin/perl use strict; use warnings; use PDF::API2; my $file = 'some.pdf'; my $pdf = PDF::API2->open($file); my $pages = $pdf->pages(); printf "PDF Version : %s\n",$pdf->version(); printf "Pages : %s\n",$pdf->pages(); for my $n (1..$pages){ my $page = $pdf->openpage($n); printf "Page %3d Media %5.2f %5.2f %5.2f %5.2f\n",$n,$page->get_medi +abox; }
        poj

        Hello again lennelei,

        This works as expected, based on your last update (To resume: for any pdf, I need to keep at most the 100 first pages (if the pdf is 15 pages, I leave it untouched ; if it's 654 pages, I create a new pdf with the pages 1 to 100 included).).

        It creates a new pdf (100 pages) if the pdf is (above 100 pages).

        #!/usr/bin/perl use strict; use warnings; use PDF::API2; my $file='test.pdf'; my $newpdf = PDF::API2->new(); my $oldpdf = PDF::API2->open($file); if ($oldpdf->pages() > 100) { printf " (%d pages)\n", $oldpdf->pages(); for my $page_nb (1..100) { $newpdf->importpage($oldpdf, $page_nb, $page_nb); } $newpdf->saveas("_".$file); }

        Hope this helps, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!
        ... I have lot of 100+ pages pdf (up to 600 pages) and they are all (but one) correctly processed by my script.

        Ok, I understand better now. I had thought you were having problems with 100+ page PDFs in general.


        Give a man a fish:  <%-{-{-{-<

Re^5: blank pdf generated using PDF::API2 (Updated)
by thanos1983 (Parson) on Jul 21, 2017 at 10:50 UTC

    Hello lennelei,

    How many pages of the old file you to keep? Hold on, are you trying to split the old file into sets of new pdfs of 10 pages each? If so try something like this.

    #!/usr/bin/perl use strict; use warnings; use PDF::API2; use Data::Dumper; my $file = 'test.pdf'; my $oldpdf = PDF::API2->open($file); my @steps = map { 10 * $_ } 1 .. 10; # print Dumper \@steps; if ($oldpdf->pages() > 100) { my $num = 0; my $last_Step = 1; for my $step (@steps) { my $newpdf = PDF::API2->new(); for my $page_nb ($last_Step .. $step) { $newpdf->import_page($oldpdf, $page_nb, $num); } $num++; $newpdf->saveas("pdf/$num"."_"."$file"); $last_Step = $step; } } __END__ $ ll pdf/ total 5736 drwxrwxr-x 2 tinyos tinyos 4096 Jul 21 12:49 ./ drwxrwxr-x 8 tinyos tinyos 4096 Jul 21 12:48 ../ -rw-rw-r-- 1 tinyos tinyos 980911 Jul 21 12:49 10_test.pdf -rw-rw-r-- 1 tinyos tinyos 274610 Jul 21 12:49 1_test.pdf -rw-rw-r-- 1 tinyos tinyos 508740 Jul 21 12:49 2_test.pdf -rw-rw-r-- 1 tinyos tinyos 340428 Jul 21 12:49 3_test.pdf -rw-rw-r-- 1 tinyos tinyos 355785 Jul 21 12:49 4_test.pdf -rw-rw-r-- 1 tinyos tinyos 216205 Jul 21 12:49 5_test.pdf -rw-rw-r-- 1 tinyos tinyos 505735 Jul 21 12:49 6_test.pdf -rw-rw-r-- 1 tinyos tinyos 248888 Jul 21 12:49 7_test.pdf -rw-rw-r-- 1 tinyos tinyos 1027594 Jul 21 12:49 8_test.pdf -rw-rw-r-- 1 tinyos tinyos 1387582 Jul 21 12:49 9_test.pdf

    Unless if I miss understood. :) Give us a description step by step what you are trying to achieve. You have a file with 100 pages and you want to create a new pdf, of how many pages of the original file?

    Seeking for Perl wisdom...on the process of learning...not there...yet!