in reply to Re^6: blank pdf generated using PDF::API2 (Updated)
in thread blank pdf generated using PDF::API2

Hello again lennelei,

This works as expected, based on your last update (To resume: for any pdf, I need to keep at most the 100 first pages (if the pdf is 15 pages, I leave it untouched ; if it's 654 pages, I create a new pdf with the pages 1 to 100 included).).

It creates a new pdf (100 pages) if the pdf is (above 100 pages).

#!/usr/bin/perl use strict; use warnings; use PDF::API2; my $file='test.pdf'; my $newpdf = PDF::API2->new(); my $oldpdf = PDF::API2->open($file); if ($oldpdf->pages() > 100) { printf " (%d pages)\n", $oldpdf->pages(); for my $page_nb (1..100) { $newpdf->importpage($oldpdf, $page_nb, $page_nb); } $newpdf->saveas("_".$file); }

Hope this helps, BR.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re^8: blank pdf generated using PDF::API2 (Updated)
by lennelei (Acolyte) on Jul 21, 2017 at 13:30 UTC
    Hi, I know this works as expected :s but not for one given file ! After lots of testing, I think it might be because the problem pdf is password protected (probably from modifications as nothing is asked to read the file).

      Hello again lennelei,

      Well this makes more sense (I think it might be because the problem pdf is password protected). Fro future reference, do you set this parameter prompt_for_password => $boolean on the module CAM::PDF/FUNCTIONS/Object creation/manipulation, or when you copied the files was not needed and it worked out of the box?

      Thanks for the update, BR.

      Seeking for Perl wisdom...on the process of learning...not there...yet!

        No: it worked out of the box. I'm sorry I didn't see the file was protected before: I didn't think about that as there was no password asked and no message displayed (and obviously no errors with scripts) when I manipulated the file manually (either via Acrobat Reader or using sejda console or sejda desktop or even Perl scripts). I'm not a PDF expert but I presume that the password is an authentication mechanism more than a protection as it doesn't prevent anything to read the file. But in that case, how is the content deciphered automatically?

        Anyway, for CAM::PDF, the script I gave in my first message is working exactly as I wrote it without any password related stuff:

        my$file='file.pdf'; my $oldpdf = CAM::PDF->new($file) or die "$CAM::PDF::errstr\n"; if ($oldpdf->numPages() > 100) { printf " (%d pages)\n", $oldpdf->numPages(); $oldpdf->extractPages(1..100); $oldpdf->cleanoutput("split_$file"); }

        Still with CAM::PDF, getPageText method works correctly and displays the real text of the file. I also managed to modify some data with getPageContent and setPageContent but not all data (I tried to obfuscate the file with this but the resulting pdf was corrupted).

        And with PDF::API2, xmpMetadata method for example produces unreadable data on that file (I cannot give the result here: it doesn't parse correctly on the site).

        I'm now looking for a way to use PDF::API2 the same way CAM::PDF is working: ie. by copying the pdf file and then removing the undesired pages but I'm not sure this is possible

        Thank you again for your help: it's almost time for me to end my week at work, so I'll return on that subject on Monday. Nice weekend folks.