in reply to OpenOffice, XML and templates

As you no doubt realise, the OpenOffice document format is just a zip file containing (amongst other things) a file called content.xml which is your document. So you can open it up with Archive::Zip and then use whatever XML manipulation tool you like on it.

I would probably tend toward using XML::LibXML. If you edit the document in OpenOffice and assign a unique style to each block of text that you might want to replace/remove, then you can find the document nodes using an XPath expression to match the style. Then you have the DOM maniplation methods at your disposal to edit the nodes.

In this example, I've skipped the Archive::Zip step and included the content.xml directly in the __DATA__ section but it illustrates finding a paragraph by matching on its style (in this case I used a style called 'VariableTextSurname'):

#!/usr/bin/perl use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; my $parser = XML::LibXML->new(); my $doc = $parser->parse_fh(\*DATA); my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() ) +; $xc->registerNs( text => 'urn:oasis:names:tc:opendocument:xmlns:text:1 +.0' ); my $xpath = q{//text:p[@text:style-name="VariableTextSurname"]}; foreach my $p ($xc->findnodes($xpath)) { print "Found a variable para\n " . $p->to_literal . "\n"; # could do e.g.: $p->parentNode->removeChild($p); } # After manipulations, serialise back to XML with: # my $xml = $doc->toString(); exit; __DATA__ <?xml version="1.0" encoding="UTF-8"?> <office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1. +0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:rdfa="http://docs.oasis-open.org/opendocument/meta/rdfa#" xmlns:field="urn:openoffice:names:experimental:ooxml-odf-interop:xml +ns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xml +ns:form:1.0" office:version="1.2" ><office:scripts /><office:font-face-decls ><style:font-face style:name="Times New Roman" svg:font-family="&apos; +Times New Roman&apos;" style:font-family-generic="roman" style:font-p +itch="variable" /><style:font-face style:name="Arial" svg:font-family="Arial" style:fo +nt-family-generic="swiss" style:font-pitch="variable" /><style:font-face style:name="DejaVu Sans" svg:font-family="&apos;Dej +aVu Sans&apos;" style:font-family-generic="system" style:font-pitch=" +variable" /></office:font-face-decls ><office:automatic-styles /><office:body ><office:text ><text:sequence-decls ><text:sequence-decl text:display-outline-level="0" text:name="Illustr +ation" /><text:sequence-decl text:display-outline-level="0" text:name="Table" /><text:sequence-decl text:display-outline-level="0" text:name="Text" /><text:sequence-decl text:display-outline-level="0" text:name="Drawin +g" /></text:sequence-decls ><text:p text:style-name="Standard" >Paragraph One</text:p ><text:p text:style-name="VariableTextSurname" >Paragraph Two</text:p ><text:p text:style-name="Standard" >Paragraph Three</text:p ></office:text ></office:body ></office:document-content >

(I did add some extra whitespace into the XML for readability).

Although you can add your own attributes to the XML, they seem to disappear if you edit the document using OpenOffice.