in reply to Search OpenOffice document for title and author [Solved]

G'day jinnicky,

The module you'll want for that is OpenOffice::OODoc::Meta.

I haven't previously used this. I did install OpenOffice::OODoc some time ago and the OpenOffice::OODoc::Meta, along with other OpenOffice::OODoc::* modules, appear to be bundled with this (see OpenOffice-OODoc distribution details). So, if you have OpenOffice::OODoc, you probably also have the related modules.

The documentation looks good and usage seems straightforward.

I created a very basic text document for testing (pm_1156099_test.odt) and added a title ("PM 1156099 Test Document") via the Properties menu item. I then created this test script:

#!/usr/bin/env perl -l use strict; use warnings; use OpenOffice::OODoc::Meta; my $meta = OpenOffice::OODoc::Meta::->new(file => 'pm_1156099_test.odt +'); print 'Author: ', $meta->creator(); print 'Title: ', $meta->title(); print 'Created: ', $meta->date();

This produced this output:

Author: Ken Cotterill Title: PM 1156099 Test Document Created: 2016-02-25T15:23:56

There's lots of other metadata you can access if you want.

[I do recall hearing something about OpenOffice::OODoc being superceded by ODF::lpOD. Both sets of modules are by the same author, Jean-Marie Gouarné, and the ODF::lpOD distribution is more recent. I looked around for some definitive information on this but was unsuccessful, so that remains unconfirmed: perhaps another monk can provide something more substantial on this matter.]

— Ken

Replies are listed 'Best First'.
Re^2: Search OpenOffice document for title and author
by jinnicky (Sexton) on Feb 25, 2016 at 16:54 UTC

    The meta data doesn't show the title which is a paragraph with the style of 'Title' or Psomething.

    I installed ODF::lpOD once I figured out that lpOD starts with lower case L. It's documentation is voluminous and not much better than OpenOffice::OODoc's.

    However I did get it to work!

    #!/usr/bin/perl -w use strict; use ODF::lpOD; my $file = $ARGV[0]; die "You must supply an odf file name\n" unless $file; my $doc = odf_document->get($file) or die "Can't load $file\n$!\n"; my $context = $doc->body; my $meta = $doc->meta; # Doesn't do what I wanted my $title = $meta->get_title; print "Title: $title\n" if $title; # shows Title: c if at all my $p = $context->get_paragraph(style=>'Title', content=>'ODF',positio +n=>0); if ($p) {print $p->get_text()."\n";} else {print "Not found\n";} # prints Not found #this works my ($i,$ps,$style,$pStyle); for ($i = 0; $i < 6; $i++) { $p = $context->get_paragraph(position=>$i); if ($p) { print "Paragraph $i"; $ps = $p->get_style(); if ($ps) { if ($ps =~ m/^P\d+$/) { # Check for internal styles $style = $doc->get_style('paragraph',$ps) || ''; $pStyle = $style->get_parent_style if $style; $ps = $pStyle if $pStyle; # This gets the real name } print " Style: $ps\n"; } else {print "No Style\n";} my $data = $p->get_text(recursive=>1); if ($data) {print "$data\n";} else {print "--No data\n";} } else { print "Paragraph $i not found\n"; } }

    Thanks Ken

    —Bob

      The ODF::LpOD modules worked on .odf files but choked on .sxw (version 1.1) files.

      Ken's suggestion about the similarity between those modules and the OpenOffice::OODoc modules helped me to go back to them and come up with this code:

      #!/usr/bin/perl use warnings; use strict; use OpenOffice::OODoc; # setup file and styles my $file = $ARGV[0]; die "You must supply an odf file name\n" unless $file; print "File: $file\n"; my $container = odfContainer($file) or die "Can't get document $file\n +"; my $doc = odfDocument(container => $container, part => 'content' +) or die "Can't get content in $file\n"; my ($i,$element,$text,$style); for ($i = 0; $i < 10; $i++) { $element = $doc->getElement('//text:p',$i); if ($element) { $text = $doc->getText($element); if ($text) { $style = $doc->getAttribute($element,'style name')||''; if ($style && ($style =~ m/^P\d+$/)) { $style = $doc->getAncestorStyle($style); } print "Paragraph $i: "; print "($style) " if $style; print "$text" if $text; print "\n"; } } }

      —Bob