BernieC has asked for the wisdom of the Perl Monks concerning the following question:

i just need to read a PDF file {not change it and not write it}. I'm using the PDF module. I have a valid PDF file {it opens without a problem}. But when I try to mess with it from Perl I get a mess. With this little program:
#!/usr/bin/perl # $Id: template.pl 1.13 2022/05/13 12:17:31 bernie Exp $ use v5.10 ; use strict; use warnings ; use Getopt::Std ; # getopts(<flags>, \%args) ; my %args ; use PDF; my $pdf = PDF->new("test.pdf") or die ""; say "is a PDF" if $pdf->IsaPDF ; say "Has ".$pdf->Pages." Pages" ; exit ;
I get
D:\Desktop>pdftest Premature end of file reached at D:\Desktop\pdftest.pl line 13. Bad object reference '' at D:\Desktop\pdftest.pl line 13. Bad object reference '' at D:\Desktop\pdftest.pl line 13. Bad object reference '' at D:\Desktop\pdftest.pl line 13. is a PDF Use of uninitialized value in concatenation (.) or string at D:\Deskto +p\pdftest.pl line 15. Has Pages
Line 13 is the "new" method and it doesn't die. Is there some other PDF module that would do what I need? Thanks

Replies are listed 'Best First'.
Re: Help with PDF module [comparison]
by kcott (Archbishop) on May 06, 2023 at 16:09 UTC

    G'day BernieC,

    Here's a quick comparison of PDF-related modules.

    PDF
    As ++marto states, 23 years old and Active bugs. Probably abandonware; I'd suggest avoiding this one.
    CAM::PDF
    Suggested by AM. This is 10 years old; has a lot of bugs; and probably abandonware. See "CAM::PDF Error: Expected identifier label" for example problem and discussion.
    PDF::API2
    I'm most familiar with this one. It was last updated just 6 months ago. I did a quick test for the page count you were trying (code mostly just copied from the SYNOPSIS). I'm not sure what else you might want; perhaps "METADATA METHODS" is of interest.
    $ perl -E ' use strict; use warnings; use PDF::API2; my $pdf = PDF::API2->new(); my $font = $pdf->font("Helvetica-Bold"); for my $p (1 .. 10) { my $page = $pdf->page(); my $text = $page->text(); $text->font($font, 20); $text->position(200, 700); $text->text("Page: $p"); } $pdf->save("test.pdf"); ' $ file test.pdf test.pdf: PDF document, version 1.4, 10 pages $ perl -E ' use strict; use warnings; use PDF::API2; my $pdf = PDF::API2->open("test.pdf"); say "Page count: ", $pdf->page_count(); ' Page count: 10
    PDF::Builder
    Suggested by marto. It was last updated just 4 months ago. I hadn't encountered this one previously. It's SYNOPSIS is almost identical to PDF::API2's. It may be a branch of PDF::API2 that's intended to provide improvements or enhancements; it mentions PDF::API2 a few times by way of comparison; I didn't see anything regarding a branch but I also didn't study the docs in detail. It has "METADATA METHODS" too.

    See its README.md for possible hurdles to using this, such as requiring Perl v5.24; having said that, it installed first time for me using the cpan utility (I have Perl v5.36.0).

    Given the similarities, I just repeated the test I did previously, replacing API2 with Builder and test.pdf with test2.pdf. At least in this respect, PDF::API2 and PDF::Builder function identically. If anyone has other information re PDF::API2 vs. PDF::Builder, please add comments.

    $ perl -E ' use strict; use warnings; use PDF::Builder; my $pdf = PDF::Builder->new(); my $font = $pdf->font("Helvetica-Bold"); for my $p (1 .. 10) { my $page = $pdf->page(); my $text = $page->text(); $text->font($font, 20); $text->position(200, 700); $text->text("Page: $p"); } $pdf->save("test2.pdf"); ' $ file test2.pdf test2.pdf: PDF document, version 1.4, 10 pages perl -E ' use strict; use warnings; use PDF::Builder; my $pdf = PDF::Builder->open("test2.pdf"); say "Page count: ", $pdf->page_count(); ' Page count: 10

    Update (additional information): I just noticed that the PDF produced by PDF::Builder is substantially bigger than that produced by PDF::API2. I would have expected them to be almost the same size.

    ken@titan ~/tmp/pm_11152014_pdf $ ls -l total 16 -rw-r--r-- 1 ken None 4272 May 7 00:50 test.pdf -rw-r--r-- 1 ken None 7024 May 7 01:05 test2.pdf

    — Ken

      "If anyone has other information re PDF::API2 vs. PDF::Builder, please add comments."

      Thanks to a private message from pryrt: "PDF::Builder::Docs - additional documentation for Builder module". (Pity you can't give a /msg a ++.)

      In particular, the History section which describes PDF::API2PDF::Builder: similar to what I guessed, but it's a lot more involved.

      "... repeated the test ... replacing API2 with Builder ..."

      Apparently, not just a bit of luck. From the same section: 'At least initially, any program written based on PDF::API2 should be convertible to PDF::Builder simply by changing "API2" anywhere it occurs to "Builder".' — so, for anyone wishing to upgrade applications from PDF::API2 to PDF::Builder, it's possibly as easy as a simple global change: s/API2/Builder/g.

      — Ken

      Here's a quick comparison of PDF-related modules

      Wait, I don't see any beyond (incomplete) enumeration, looks like you forgot to append results. Oh, then, here they are (some of them for a start), and using simple test file generated with code you kindly provided:

      use strict; use warnings; use PDF::API2; use CAM::PDF; use Benchmark 'cmpthese'; my $fn = 'test.pdf'; my $str = do { local ( @ARGV, $/ ) = $fn; <> }; cmpthese -1, { 'PDF::API2' => sub { PDF::API2-> from_string( $str )-> page_count }, 'CAM::PDF' => sub { CAM::PDF-> new( $str )-> numPages }, 'CAM::PDF+' => sub { my $d = CAM::PDF-> new( $str ); $d-> cacheObjects; $d-> numPages }, }; __END__ Rate PDF::API2 CAM::PDF+ CAM::PDF PDF::API2 163/s -- -74% -95% CAM::PDF+ 614/s 277% -- -83% CAM::PDF 3586/s 2106% 484% --

      The 'plussed' entry (kind of "parse everything") is for those who may have (reasonable) doubts if perhaps one module (guess which) makes harder effort to extract a lot more info initially, to provide a user with richer environment to inspect things more cosily (or something like that); but in fact, they (nonplussed) seem both to extract approximately same amount of info on open. It's just one parser (guess which) is very poor indeed.

Re: Help with PDF module
by marto (Cardinal) on May 06, 2023 at 13:46 UTC
Re: Help with PDF module
by Anonymous Monk on May 06, 2023 at 13:46 UTC