srikrishnan has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
I have written the following perl script using exiftool module, in order to add xmp metadata in the pdf files. I have successfully done this process, But I have some errors
I am not able to change “XMP” instead of “XAP” Also I am not able to add “xmp:identifier” and “prism:url” values in my pdf.
Below is my script:
use strict; use warnings; use Image::ExifTool ':Public'; my @CreatVal = ("Michael L. Oldham","Dheeraj Khare","Florante A. Quioc +ho","Amy L. Davidson","Jue Chen"); my $DescrVal = "XXX 450, 515 (2007). doi:10.1038/XXX06264"; my $FormtVal = "application/pdf"; my $PublsVal = "XXX Publishing Group"; my $RightVal = "© 2007 XXX Publishing Group"; my $TitleVal = "Crystal structure of a catalytic intermediate of the m +altose transporter"; my $ProdcVal = "Acrobat Distiller 6.0.1 (Windows)"; my $CpyrtVal = "© 2007 XXX Publishing Group"; my $DoiiiVal = "10.1038/XXX06264"; my $EissnVal = "1476-4687"; my $EndPgVal = "521"; my $IssnoVal = "0028-0836"; my $NumbrVal = "7169"; my $PubDtVal = "2007-11-22"; my $PubNmVal = "XXX"; my $RgtAgVal = "permissions\@XXX.com"; my $SrtPgVal = "515"; my $UrlllVal = "dx.doi.org/10.1038/XXX06264"; my $VolumVal = "450"; my $CrtDtVal = "2007-11-13T10:51:07+08:00"; my $CrtTlVal = "3B2 Total Publishing System 7.51n/W"; my $LabelVal = "XXX 450, 515 (2007). doi:10.1038/XXX06264"; my $MDdatVal = "2007-11-13T12:19:19+08:00"; my $MfyDtVal = "2007-11-13T12:19:19+08:00"; my $DocIDVal = "uuid:27bf4dc2-daa2-46a0-9944-4aeea86cb8d0"; my $InsIDVal = "uuid:27f7c7ea-bde3-49fd-b76b-0975741cc5d9"; my $MarkdVal = "True"; my $FileName = "E:\\3B2_Production_Problems\\XMP_Metadata\\Nnano.pdf"; unlink "E:\\3B2_Production_Problems\\XMP_Metadata\\modified_Nnano.pdf" +; my $mdfyName = "E:\\3B2_Production_Problems\\XMP_Metadata\\modified_Nn +ano.pdf"; my $success = "Nil"; my $errStr = "Nil"; my $exifTool = new Image::ExifTool ':Public'; $exifTool->SetNewValue(); foreach my $singleAu(@CreatVal) { ($success, $errStr) = $exifTool->SetNewValue('Creator'=> $singleAu, Ad +dValue => 1); #print "Success: $success\n"; #print "Error $errStr\n"; } $exifTool->Options(Charset => 'Latin'); $exifTool->SetNewValue('About' => 'doi:'.$DoiiiVal,Group=>'XMP-RDF', P +rotected=>0x01); $exifTool->SetNewValue('Description',$DescrVal); $exifTool->SetNewValue('Format',$FormtVal); $exifTool->SetNewValue('Identifier','doi:'.$DoiiiVal); $exifTool->SetNewValue('Publisher', $PublsVal); $exifTool->SetNewValue('Rights',$RightVal, Charset => 'Latin'); $exifTool->SetNewValue('Title',$TitleVal); $exifTool->SetNewValue('Producer',$ProdcVal); #$exifTool->SetNewValue('XMP-RDF:About','doi:'.$DoiiiVal, Protected=>' +0x01', Protected=>'0x02'); $exifTool->SetNewValue('XMP-PRISM:Copyright' => $CpyrtVal, Charset => +'Latin'); $exifTool->SetNewValue('DOI',$DoiiiVal); $exifTool->SetNewValue('EIssn',$EissnVal); $exifTool->SetNewValue('EndingPage',$EndPgVal); $exifTool->SetNewValue('ISSN',$IssnoVal); $exifTool->SetNewValue('Number',$NumbrVal); $exifTool->SetNewValue('PublicationDate',$PubDtVal); $exifTool->SetNewValue('PublicationName',$PubNmVal); $exifTool->SetNewValue('RightsAgent',$RgtAgVal); $exifTool->SetNewValue('StartingPage',$SrtPgVal); $exifTool->SetNewGroups('prism'); $exifTool->SetNewValue('url',$UrlllVal, Group => 'prism'); $exifTool->SetNewValue('Volume',$VolumVal); $exifTool->SetNewGroups('XMP'); $exifTool->SetNewValue('CreateDate' => $CrtDtVal, Group => 'XMP'); $exifTool->SetNewValue('CreatorTool' => $CrtTlVal, Group => 'XMP'); $exifTool->SetNewValue('Identifier'=> 'doi:'.$DoiiiVal, Group => 'XMP' +); $exifTool->SetNewValue('Label'=> $LabelVal, Group => 'XMP'); $exifTool->SetNewValue('MetadataDate' => $MDdatVal, Group => 'XMP'); $exifTool->SetNewValue('ModifyDate' => $MfyDtVal, Group => 'XMP'); $exifTool->SetNewValue('DocumentID' => $DocIDVal); $exifTool->SetNewValue('InstanceID' => $InsIDVal); $exifTool->SetNewValue('Marked',$MarkdVal); $exifTool->WriteInfo($FileName, $mdfyName);
Please advice me to solve this problem.

Thanks in Advance,

Srikrishnan

Replies are listed 'Best First'.
Re: Adding xmp metadata using exiftool module
by roboticus (Chancellor) on Dec 01, 2008 at 12:48 UTC
    srikrishnan:

    I'm sorry, but I'm unfamiliar with both Image::ExifTool and the internals of .pdf files. I'm writing just to offer a suggestion or two.

    First, I notice that you restrict your variable names to 8 characters. While I'm a fan of shorter variable names, I'm surprised that you're wasting three of your characters by using a suffix of 'Val' on (nearly) all variables. I'd drop that habit like a hot rock. It's much more important to make your variable names clear than short.

    For example, suppose you have a variable to represent a store owners Social Security Number. If you're a Java programmer, you might wind up with StoreOwnersSocialSecurityNumber or some such, which is much too long for my comfort. It looks like you might try to pack it down to $StOSSVal. I tend to use standardized abbreviations in my code to shorten variable names. In any industry, you'll have a particular set of frequently-used terms, and you should have (or create) a set of standard abbreviations for those terms. For example, a very common contraction of Social Security Number is SSN, so I'd use that throughout the application. So I'd shorten the variable to $StoreOwnerSSN, which is short and clear.

    Secondly, it appears that the majority of those variables are used only once. Looking at how you're using them suggests the use of a hash, something like:

    use strict; use warnings; use Image::ExifTool ':Public'; my @PDFAuthors = ( "Michael L. Oldham", "Dheeraj Khare", "Florante A. Quiocho", "Amy L. Davidson", "Jue Chen" ); my %StdPDFattrs = ( Description => "XXX 450, 515 (2007). doi:10.1038/XXX06264", Format => "application/pdf", Publisher => "XXX Publishing Group", Rights => " 2007 XXX Publishing Group", Charset => 'Latin', Title => "Crystal structure of a catalytic intermediate of the mal +tose transporter", Producer => "Acrobat Distiller 6.0.1 (Windows)", Cpyrt => " 2007 XXX Publishing Group", Identifier => "10.1038/XXX06264", EIssn => "1476-4687", EndingPage => "521", <<<SNIP>>> ); <<<SNIP>>> my $exifTool = new Image::ExifTool ':Public'; $exifTool->SetNewValue(); for my $Author (@PDFAuthors) { ($success, $errStr) = $exifTool->SetNewValue('Creator'=> $Author, +AddValue => 1); #print "Success: $success\n"; #print "Error $errStr\n"; } for my $Attribute (keys %StdPDFattrs) { $exifTool->SetNewValue( $Attribute, $StdPDFattrs{$Attribute} ); } <<<SNIP>>>

    One last suggestion--Don't blindly double-space all your code. Whitespace has a single function: to make the program easy to read/understand. So use indentation and whitespace to make program structure clear.

    Good luck with your problem!

    ...roboticus

    P.S. As I was editing your code, I notice that you use 'XMP-RDF' in one place and 'XMP' elsewhere, perhaps your problem with XAP vs. XMP might be a consistency issue?

    Update: Repaired CPAN link for module Image::ExifTool.

    Update: Fixed %StdPDFattrs definition: I had square brackets instead of parenthesis. ++ to moritz for the catch!

      Hi,

      Thanks for your suggestions. I will try to follow your valuable suggestions.

      Before that, just for your information.

      This is not an actual script. just I have try to make one sample for our customer. If we get approval from them, then we will write a script to extract all the datas from the source files. We are using a typesetting software called "3B2", which only supports scalar variables (not arrays and hashes)

      Anyway I hope your comments are very useful tips for us.

      Thanks,
      Srikrishnan