Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: XML round-trip with comments and

by Fletch (Bishop)
on Jul 28, 2021 at 15:51 UTC ( #11135438=note: print w/replies, xml ) Need Help??


in reply to XML round-trip with comments and prolog

Vague guessing (fortunately haven't had to seriously diddle w/XML in years) maybe helpful: Your first item is a "processing instruction". Looks like XML::Rules sits on top of XML::Parser::Expat which says that it emits "Proc" events or "Comment" events when encountering what you mention so if there's some way to wire the former module up to be called for the latter items that might be what you want. Alternately you may need to resort to using that lower level module directly.

Edit: event phrasing.

The cake is a lie.
The cake is a lie.
The cake is a lie.

Replies are listed 'Best First'.
Re^2: XML round-trip with comments and
by pryrt (Monsignor) on Jul 28, 2021 at 17:52 UTC

    At first, I thought that might get too low-level for me. But then I saw that XML::Rules->new has a handlers => {...} which allows defining handlers for XML::Parser::Expat events. Some experimentation with a dummy callback to handle them all says that Comment and XMLDecl are the events I want during the parsing.

    #!perl use 5.012; # strict, // use warnings; use Data::Dump; use XML::Rules; my $xml_doc = <<EOXML; <?xml version="1.0" encoding="UTF-8" ?> <!-- important instructions to manual editors --> <root> <group name="blah"> <!-- important instructions for group "blah" --> <tag/> </group> <group name="second"> <!-- important instructions for group "second" --> <differentTag/> </group> </root> EOXML my $callback = sub { my ($name, $parser, @args) = @_; print STDERR "event:", $name//'<undef>', "("; print STDERR join ', ', map {defined($_) ? qq("$_") : '<undef>'} @ +args; print STDERR ")\n"; }; my %handlers = (); for my $h ( qw/Comment XMLDecl/ ) { #qw/Start End Char Proc Comment Cd +ataStart CdataEnd Default Unparsed Notation ExternEnt ExternEntFin En +tity Element Attlist Doctype DoctypeFin XMLDecl/) { $handlers{$h} = sub { $callback->($h => @_) } } my $parser = XML::Rules->new( stripspaces => 3|4, rules => [ _default => 'raw', ], handlers => \%handlers, ); #dd my $data = $parser->parse($xml_doc); print my $out = $parser->ToXML($data, 0, " ", "") . "\n"; __DATA__ event:XMLDecl("1.0", "UTF-8", <undef>) event:Comment(" important instructions to manual editors ") event:Comment(" important instructions for group "blah" ") event:Comment(" important instructions for group "second" ") <root> <group name="blah"> <tag/> </group> <group name="second"> <differentTag/> </group> </root>

    Now that I've got that far, I should be able to get the prolog and comments into the data object (by returning values, instead of just printing messages). But the harder part will be how to get ->ToXML() to do something on the output. I may have to subclass XML::Rules to get additional outputs for my comment and prolog data items -- if anyone has an easier idea than that, feel free to let me know.

      I should be able to get the prolog and comments into the data object

      I was optimistic. I misunderstood that the handler return expectations would match the rule return expectations, and whatever was returned from the handler would be added to the data structure, similarly to a custom rule. That didn't happen. And with haukex's advice that round-tripping is difficult, this might not be the right path after all. If anyone has advice for continuing down this path, I will definitely experiment with it, no matter which direction I go from here.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11135438]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2022-05-24 22:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (84 votes). Check out past polls.

    Notices?