Reading page label (name) from PDF file...

squirly has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, looking for a way of reading the Page Label from a PDF in Perl; and if that's not possible then a way of adding / reading metadata.

Problem I'm working to solve is that I'm using a PDF file as a 'template', and then reading in another file of 'data', and then based on the dynamic data, the script is keyed to extract a page from the template and insert into a new pdf (each template page being potentially recycled n times for each completed output pdf).

This is working, and for proof of concept I keyed the script to use page_numbers (page 1 is a basic contents page template, page 2 is a legaleaze disclaimer page that is added at the end of a set of pages for each member for example) in the template pdf, however I need a way of expressing to the script which pages are which for the eventual production environment; they won't always be arbitrary page 1 = this, page 2 = this; sometimes the template may have up to six different pages, each with a different purpose.

Was thinking an easy way of addressing this was to give each page a label and then keying the script off of the label names, however I can't figure out how to extract the page label from the input pdf template; only how to set a label.

Currently using the PDF::API2 library, however am open to using any library in order to read and extra the page_label=page_number information (that'll be stored in an eventual hash / associative array).

Appreciate any and all assistance, and I apologize for rudimentary post, it's my first attempt reaching out.

Thank you!

Comment on Reading page label (name) from PDF file...

Replies are listed 'Best First'.
Re: Reading page label (name) from PDF file... by poj (Abbot) on May 25, 2016 at 06:38 UTC
Could you use a Document Property e.g. Keywords ? #!/usr/bin/perl use strict; use PDF::API2; # A4 my $pdf = PDF::API2->new( width => 595, height => 842, ); my $page = $pdf->page; my $txt = $page->text; my $font = $pdf->corefont('Times-Roman'); $txt->font( $font, 32 ); $txt->translate( 100, 500 ); $txt->text( 'Test PDF with Keywords' ); my @pages = qw( PageName1 PageName2 PageName3 PageName4 PageName5 PageName6 PageName7 PageName8 PageName9 ); $pdf->info( 'Keywords' => join ';',@pages ); $pdf->saveas('template.pdf'); my $pdf1 = PDF::API2->open('template.pdf'); my %info = $pdf1->info(); my @pages = split ';',$info{'Keywords'}; print join "\n",@pages; [download] poj	[reply] [d/l]
Re^2: Reading page label (name) from PDF file... by Anonymous Monk on May 25, 2016 at 18:14 UTC
Thank you! That works! My original direction was to attempt to label the pages in order to get a handle on them; adding keyword meta data to describe the pages works equally well. I really appreciate your help. I used your code snippet and adapted to to read some custom tags in form of "stmnt1=1;disclaimer=2;etc" and it works! `# retrieve page labels from pdf (specified in pdf keywords section as +pagelabel=#;pagelabel=#;pagelabel=#;etc...) my %info = $pdf_template->info(); my %page_labels = (); my @attributes; if (defined $info{'Keywords'}) { my @pages = split ';',$info{'Keywords'}; foreach (@pages) { @attributes = split "=",$_; $page_labels{$attributes[0]} = $attributes[1]; if (($debugging != 0 ) \|\| ($verbose !=0 )) { print "$attributes[0]: $attributes[1]\n"; } } undef @pages; } undef %info; undef @attributes;` [download]	[reply] [d/l]