http://qs1969.pair.com?node_id=11135437

pryrt has asked for the wisdom of the Perl Monks concerning the following question:

I want to automate changing some settings in an app which uses XML config files, but I am not an XML expert and don't have real experience with any of the XML modules (other than knowing from reading here that I need to avoid XML::Simple). I like starting from "known good" code as examples, and playing around until I understand it better. I found some of haukex's examples with XML::Rules, especially Re^6: XML compare with a key and Re: How do I get a list in a perl hash generated from an XML?, which got me to the point that I could get the XML parsed into an initial data structure which seems reasonable to me.

But my next goal was to round-trip the config file: to see if I could get an output file that's compatible with the input, so it's still usable as a config file for the app. So far, I've got a short example of:

#!perl use 5.012; # strict, // use warnings; use Data::Dump; use XML::Rules; my $xml_doc = <<EOXML; <?xml version="1.0" encoding="UTF-8" ?> <!-- important instructions to manual editors --> <root> <group name="blah"> <!-- important instructions for group "blah" --> <tag/> </group> <group name="second"> <!-- important instructions for group "second" --> <differentTag/> </group> </root> EOXML my $parser = XML::Rules->new( stripspaces => 3|4, rules => [ _default => 'raw', ], ); #dd my $data = $parser->parse($xml_doc); print my $out = $parser->ToXML($data, 0, " ", "") . "\n"; __DATA__ <root> <group name="blah"> <tag/> </group> <group name="second"> <differentTag/> </group> </root>

... But there are two things I haven't figured out how to do, as evidenced by the differences between the input text and the output text.

So, is XML::Rules the right choice for this? (And if so, how do I accomplish it?) If not, which module is better equipped for my goals? (And could you provide a similar example, showing how to round-trip through the data structure and still have prolog and comments?)

Thank you.

edit: fixed missing sentence separator and missing paragraph indicators; fix title

Replies are listed 'Best First'.
Re: XML round-trip with comments and
by Fletch (Bishop) on Jul 28, 2021 at 15:51 UTC

    Vague guessing (fortunately haven't had to seriously diddle w/XML in years) maybe helpful: Your first item is a "processing instruction". Looks like XML::Rules sits on top of XML::Parser::Expat which says that it emits "Proc" events or "Comment" events when encountering what you mention so if there's some way to wire the former module up to be called for the latter items that might be what you want. Alternately you may need to resort to using that lower level module directly.

    Edit: event phrasing.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      At first, I thought that might get too low-level for me. But then I saw that XML::Rules->new has a handlers => {...} which allows defining handlers for XML::Parser::Expat events. Some experimentation with a dummy callback to handle them all says that Comment and XMLDecl are the events I want during the parsing.

      #!perl use 5.012; # strict, // use warnings; use Data::Dump; use XML::Rules; my $xml_doc = <<EOXML; <?xml version="1.0" encoding="UTF-8" ?> <!-- important instructions to manual editors --> <root> <group name="blah"> <!-- important instructions for group "blah" --> <tag/> </group> <group name="second"> <!-- important instructions for group "second" --> <differentTag/> </group> </root> EOXML my $callback = sub { my ($name, $parser, @args) = @_; print STDERR "event:", $name//'<undef>', "("; print STDERR join ', ', map {defined($_) ? qq("$_") : '<undef>'} @ +args; print STDERR ")\n"; }; my %handlers = (); for my $h ( qw/Comment XMLDecl/ ) { #qw/Start End Char Proc Comment Cd +ataStart CdataEnd Default Unparsed Notation ExternEnt ExternEntFin En +tity Element Attlist Doctype DoctypeFin XMLDecl/) { $handlers{$h} = sub { $callback->($h => @_) } } my $parser = XML::Rules->new( stripspaces => 3|4, rules => [ _default => 'raw', ], handlers => \%handlers, ); #dd my $data = $parser->parse($xml_doc); print my $out = $parser->ToXML($data, 0, " ", "") . "\n"; __DATA__ event:XMLDecl("1.0", "UTF-8", <undef>) event:Comment(" important instructions to manual editors ") event:Comment(" important instructions for group "blah" ") event:Comment(" important instructions for group "second" ") <root> <group name="blah"> <tag/> </group> <group name="second"> <differentTag/> </group> </root>

      Now that I've got that far, I should be able to get the prolog and comments into the data object (by returning values, instead of just printing messages). But the harder part will be how to get ->ToXML() to do something on the output. I may have to subclass XML::Rules to get additional outputs for my comment and prolog data items -- if anyone has an easier idea than that, feel free to let me know.

        I should be able to get the prolog and comments into the data object

        I was optimistic. I misunderstood that the handler return expectations would match the rule return expectations, and whatever was returned from the handler would be added to the data structure, similarly to a custom rule. That didn't happen. And with haukex's advice that round-tripping is difficult, this might not be the right path after all. If anyone has advice for continuing down this path, I will definitely experiment with it, no matter which direction I go from here.

Re: XML round-trip with comments and prolog
by haukex (Archbishop) on Jul 28, 2021 at 18:31 UTC
    But my next goal was to round-trip the config file

    My experience with XML::Rules was that, sadly, it is not good at round-tripping XML files (Update: see also Jenda's comments here). IMHO, if that's what you want, I'd just go with XML::LibXML, in conjunction with its XPath support it's not too difficult to pull values out of the config file while maintaining the original XML file's structure. An alternative might be XML::Twig, though my experience with that is more limited.

      Thanks. I have occasionally muddled through XPath terms for a very specific experiment, but it just doesn't stick with me, so I was hoping to avoid it. But I guess I'll try going down that path for this; maybe because I this will likely be more than a one-time script for me, I might remember more of the concepts (especially XPath) that I learn this time through.

      The good news is, I can at least get a super-simple round-trip with comments:

      ... use XML::LibXML; my $dom = XML::LibXML->load_xml(string => $xml_doc); print $dom->toString(), "-----\n"; # works!

      ... so that's something. ;-)

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: XML round-trip with comments and prolog
by pryrt (Abbot) on Jul 30, 2021 at 13:24 UTC

    To close out this discussion, here's an example implementation using XML::LibXML of taking the example data, finding a specific node, editing that node, and adding a new node after it, and outputting the final XML to complete the round trip.

    #!perl -l use 5.012; # strict, // use warnings; use Data::Dump; use XML::LibXML; my $xml_doc = <<EOXML; <?xml version="1.0" encoding="UTF-8" ?> <!-- important instructions to manual editors --> <root> <group name="somethingElse"> <Item MenuEntryName="Edit" MenuItemName="Copy"/> </group> <group name="contextMenu"> <!-- still commented --> <Item MenuEntryName="Edit" MenuItemName="Cut"/> <Item MenuEntryName="Edit" MenuItemName="Copy"/> <Item MenuEntryName="Edit" MenuItemName="Paste"/> </group> </root> EOXML # process XML my $dom = XML::LibXML->load_xml(string => $xml_doc, no_blanks => 1); # find specific item and its parent my ($desiredItem) = $dom->findnodes('//group[@name="contextMenu"]/Item +[@MenuItemName="Copy"]'); printf "%-23s %s\n", "Found:", $desiredItem->toString(); my $itemParent = $desiredItem->parentNode; printf "%-23s %s\n", "Parent:", $itemParent->toString(1); # prove that I can edit the item $desiredItem->setAttribute( comment => 'edited' ); printf "%-23s %s\n", "Edited:", $desiredItem->toString(); printf "%-23s %s\n", "Parent:", $itemParent->toString(1); # create item and show it's not added to parent yet my $newItem = XML::LibXML::Element->new('Item'); $newItem->setAttribute( MenuEntryName => "Edit"); $newItem->setAttribute( MenuItemName => "Delete"); $newItem->setAttribute( comment => "created"); printf "%-23s %s\n", "Created:", $newItem->toString(); printf "%-23s %s\n", "Parent:", $itemParent->toString(1); # add item $itemParent->insertAfter($newItem, $desiredItem); printf "%-23s %s\n", "Added to Parent:", $itemParent->toString(1); # verify it's there and edit it my ($foundNew) = $dom->findnodes('//group[@name="contextMenu"]/Item[@c +omment="created"]'); printf "%-23s %s\n", "Found:", $foundNew->toString(); $foundNew->setAttribute( comment => 'inserted' ); printf "%-23s %s\n", "Edited:", $foundNew->toString(); printf "%-23s %s\n", "Parent:", $itemParent->toString(1); # finish with the Round Trip output my $str = $dom->toString(1); $str =~ s/(^|\G)( |\t)/ /gm; print "\n\nRound Trip output:\n-----\n$str\n=====\n\n"; __END__ ... snipped intermediate output ... Round Trip output: ----- <?xml version="1.0" encoding="UTF-8"?> <!-- important instructions to manual editors --> <root> <group name="somethingElse"> <Item MenuEntryName="Edit" MenuItemName="Copy"/> </group> <group name="contextMenu"> <!-- still commented --> <Item MenuEntryName="Edit" MenuItemName="Cut"/> <Item MenuEntryName="Edit" MenuItemName="Copy" comment="edited +"/> <Item MenuEntryName="Edit" MenuItemName="Delete" comment="inse +rted"/> <Item MenuEntryName="Edit" MenuItemName="Paste"/> </group> </root> =====

    So thanks all for the helpful suggestions.

      Note that non-ancient XML::LibXML supports the hash syntax for attributes, so you can replace your setAttribute calls with
      $desiredItem->{comment} = 'edited'; ... @$newItem{qw{ MenuEntryName MenuItemName comment }} = qw{ Edit Delete created }; ... $foundNew->{comment} = 'inserted';

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]