eversuhoshin has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I want to get rid of the first two pages in a pdf file so I can later convert it to an excel file. Here is the file

http://www.sec.gov/divisions/investment/13f/13flist2014q1.pdf

I get an error message that the pdf is cross referenced from my code. Any other way to do it would be appreciated. Thank you so much.

use strict; use warnings; use CAM::PDF; use PDF::API2; my $pdfone = CAM::PDF->new('C:/Documents/2014q1.pdf') || die "$CAM::PD +F::errstr\n";; my $pdftwo = PDF::API2->open('C:/Documents/2014q1_new.pdf') ; my $font = $pdftwo->corefont('Helvetica-Bold'); for my $pagenum (3.. $pdfone->numPages() ) { my $text = $pdfone->getPageText($pagenum) or next; my $page = $pdftwo->page(); # add a new page my $pdf_text = $page->text(); $pdf_text->font($font,12); my @lines = split("\n",$text); my ($x,$y) = (50,700); for my $line (@lines) { $pdf_text->translate($x,$y); $pdf_text->text($line); $y = $y - 20; } } $pdftwo->saveas('C:/Documents/2014q1_new.pdf');

Replies are listed 'Best First'.
Re: Create new pdf File
by Athanasius (Archbishop) on Sep 03, 2014 at 03:28 UTC

    Hello eversuhoshin,

    I had to create the target file by changing this line:

    my $pdftwo = PDF::API2->open(...) ;

    to this:

    my $pdftwo = PDF::API2->new();

    Then the script ran successfully to completion, with no errors. I then changed the new call back to open, and ran again, also successfully and with no errors. The second run appended the same output to the output from the first run. One run takes around 2 minutes and produces 490 PDF pages. Here is a sample of the output (first page):

    So, I cannot reproduce the error you report.

    Hope that helps (somehow),

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Create new pdf File
by AppleFritter (Vicar) on Sep 03, 2014 at 10:27 UTC

    You don't actually need Perl to drop pages from a PDF file. Use e.g. poppler:

    $ pdfseparate -f 3 13flist2014q1.pdf page%d.pdf $ pdfunite page?.pdf page??.pdf page???.pdf combined.pdf

    There's probably a tool somewhere that'll do this in one step, too.