Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

CAM::PDF Error: Expected identifier label

by jmlynesjr (Deacon)
on Mar 24, 2023 at 19:13 UTC ( [id://11151177]=perlquestion: print w/replies, xml ) Need Help??

jmlynesjr has asked for the wisdom of the Perl Monks concerning the following question:

This falls into the "There are no five minute jobs" category. This should have been easy...

I have a 9 page PDF file that I want to delete pages 1 and 9 or extract 2..8, often. Both extractPages() and deletePages() fails with the same error below. Any ideas? I also tried PDF::API2, but the install fails with a PadWalker install fail. Thanks for your assistance.

Test Code

#! /usr/bin/perl # Environment: Ubuntu 22.04LTS, Perl 5 version 34 subversion 0 use strict; use warnings; use CAM::PDF; my $pdf = CAM::PDF->new('DX.pdf'); my $np = $pdf->numPages(); print "Number of pages: $np\n"; $pdf->extractPages( 2..8 ); $pdf->cleanoutput('DXout.pdf');

Output and error message

Number of pages: 9 Expected identifier label: 15 % 1 /Parent 1 0 R /MediaBox [ 0 0 ...

James

There's never enough time to do it right, but always enough time to do it over...

Replies are listed 'Best First'.
Re: CAM::PDF Error: Expected identifier label
by kcott (Archbishop) on Mar 24, 2023 at 23:04 UTC

    G'day James,

    I used the following to create a minimal, 9-page, PDF document:

    #!/usr/bin/env perl use strict; use warnings; use PDF::API2; my $pdf = PDF::API2::->new(); my $font = $pdf->font('Helvetica-Bold'); for my $content ('A' .. 'I') { my $page = $pdf->page(); my $text = $page->text(); $text->font($font, 20); $text->position(200, 700); $text->text($content); } $pdf->save('DX.pdf');

    A visual check of "DX.pdf", with Foxit Reader, showed nine pages with "A", "B", ..., "H", "I".

    I pretty much copied your posted code. Here's my exact script:

    #!/usr/bin/env perl use strict; use warnings; use CAM::PDF; my $pdf = CAM::PDF->new('DX.pdf'); my $np = $pdf->numPages(); print "Number of pages: $np\n"; $pdf->extractPages( 2..8 ); $pdf->cleanoutput('DXout.pdf');

    When I ran this, the only output was:

    Number of pages: 9

    A visual check of "DXout.pdf", with Foxit Reader, showed seven pages with "B", ..., "H".

    So, your code is behaving as expected. This suggests a problem with your source PDF. There could also be a problem with CAM::PDF itself: there are currently 52 active bugs; I did a quick scan but didn't see anything obvious; you may want to look more closely; updating to the latest version might be warranted.

    Thanks for posting your environment information. I'm using Cygwin on Win10 (both updated less than 24 hours ago); Perl 5.36.0 (via Perlbrew); CAM::PDF 1.60 (latest version, newly installed as I haven't used that module previously); PDF::API2 2.043 (that's what I had, latest is 2.044).

    As an afterthought, I did `perlbrew switch perl-5.34.0` and repeated the above tests. I got the same successful results: your Perl version doesn't appear to be an issue.

    For your PDF::API2 installation issue, I suggest you post that separately and include verbatim error & warning messages.

    — Ken

      Pdf format moves on, cam pdf hasn't

        Fair enough comment and could point to the problem.

        CAM::PDF was last updated a decade ago; PDF::API2 was last updated a few months ago. When the OP sorts out the API::PDF2 installation problem, something like this might be a better way to go:

        #!/usr/bin/env perl use strict; use warnings; use PDF::API2; my $src = PDF::API2->open('DX.pdf'); my $pdf = PDF::API2::->new(); for my $page_num (2 .. $src->page_count() - 1) { $pdf->import_page($src, $page_num, $page_num - 1); } $pdf->save('DXout2.pdf');

        I did a visual check of "DXout2.pdf": it looks the same as "DXout.pdf" but is substantially larger.

        $ ls -al ... -rw-r--r-- 1 ken None 3855 Mar 25 09:10 DX.pdf -rw-r--r-- 1 ken None 3277 Mar 25 09:42 DXout.pdf -rw-r--r-- 1 ken None 5525 Mar 25 10:46 DXout2.pdf ...

        If the OP chooses that route, an improvement on my simplistic solution may be worth investigating.

        Edit (fixed typo): s/API::PDF2/PDF::API2/

        — Ken

Re: CAM::PDF Error: Expected identifier label
by LanX (Saint) on Mar 24, 2023 at 20:07 UTC
    > I also tried PDF::API2, but the install fails with a PadWalker install fail.

    That's very weird!

    PadWalker is used for things like debugging, introspection or meta-programing, but not in such kind of production code.

    Looking into PDF::API2 on metacpan doesn't list PadWalker as dependency and I have no idea why it even should.

    Maybe try replicating this on another clean machine or another Perl installation to be sure your environment is clean.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

        my hunch is that 2 Perl installations were mixed, with both showing up in ENV and the c-compiler gets messed up.

        Cheers Rolf
        (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
        Wikisyntax for the Monastery

Re: CAM::PDF Error: Expected identifier label
by LanX (Saint) on Mar 24, 2023 at 19:32 UTC
    Did you already check with a PDF from another source?

    Maybe the format of your "often" processed PDFs is somehow corrupt.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

Re: CAM::PDF Error: Expected identifier label
by Anonymous Monk on Mar 24, 2023 at 23:55 UTC

    Interesting... I think

    ${$c} =~ m/ \G (%.*\n\s*)+ /xgc;

    may be injected at line 1298 (and also at 1603 (for good measure, won't hurt))

      Why is it interesting and what's the difference with actual code ? Is it a safety modification or a bugfix ?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11151177]
Approved by LanX
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 05:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found