http://qs1969.pair.com?node_id=11135438


in reply to XML round-trip with comments and prolog

Vague guessing (fortunately haven't had to seriously diddle w/XML in years) maybe helpful: Your first item is a "processing instruction". Looks like XML::Rules sits on top of XML::Parser::Expat which says that it emits "Proc" events or "Comment" events when encountering what you mention so if there's some way to wire the former module up to be called for the latter items that might be what you want. Alternately you may need to resort to using that lower level module directly.

Edit: event phrasing.

The cake is a lie.
The cake is a lie.
The cake is a lie.

Replies are listed 'Best First'.
Re^2: XML round-trip with comments and
by pryrt (Abbot) on Jul 28, 2021 at 17:52 UTC

    At first, I thought that might get too low-level for me. But then I saw that XML::Rules->new has a handlers => {...} which allows defining handlers for XML::Parser::Expat events. Some experimentation with a dummy callback to handle them all says that Comment and XMLDecl are the events I want during the parsing.

    #!perl use 5.012; # strict, // use warnings; use Data::Dump; use XML::Rules; my $xml_doc = <<EOXML; <?xml version="1.0" encoding="UTF-8" ?> <!-- important instructions to manual editors --> <root> <group name="blah"> <!-- important instructions for group "blah" --> <tag/> </group> <group name="second"> <!-- important instructions for group "second" --> <differentTag/> </group> </root> EOXML my $callback = sub { my ($name, $parser, @args) = @_; print STDERR "event:", $name//'<undef>', "("; print STDERR join ', ', map {defined($_) ? qq("$_") : '<undef>'} @ +args; print STDERR ")\n"; }; my %handlers = (); for my $h ( qw/Comment XMLDecl/ ) { #qw/Start End Char Proc Comment Cd +ataStart CdataEnd Default Unparsed Notation ExternEnt ExternEntFin En +tity Element Attlist Doctype DoctypeFin XMLDecl/) { $handlers{$h} = sub { $callback->($h => @_) } } my $parser = XML::Rules->new( stripspaces => 3|4, rules => [ _default => 'raw', ], handlers => \%handlers, ); #dd my $data = $parser->parse($xml_doc); print my $out = $parser->ToXML($data, 0, " ", "") . "\n"; __DATA__ event:XMLDecl("1.0", "UTF-8", <undef>) event:Comment(" important instructions to manual editors ") event:Comment(" important instructions for group "blah" ") event:Comment(" important instructions for group "second" ") <root> <group name="blah"> <tag/> </group> <group name="second"> <differentTag/> </group> </root>

    Now that I've got that far, I should be able to get the prolog and comments into the data object (by returning values, instead of just printing messages). But the harder part will be how to get ->ToXML() to do something on the output. I may have to subclass XML::Rules to get additional outputs for my comment and prolog data items -- if anyone has an easier idea than that, feel free to let me know.

      I should be able to get the prolog and comments into the data object

      I was optimistic. I misunderstood that the handler return expectations would match the rule return expectations, and whatever was returned from the handler would be added to the data structure, similarly to a custom rule. That didn't happen. And with haukex's advice that round-tripping is difficult, this might not be the right path after all. If anyone has advice for continuing down this path, I will definitely experiment with it, no matter which direction I go from here.