At first, I thought that might get too low-level for me. But then I saw that XML::Rules->new has a handlers => {...} which allows defining handlers for XML::Parser::Expat events. Some experimentation with a dummy callback to handle them all says that Comment and XMLDecl are the events I want during the parsing.
#!perl
use 5.012; # strict, //
use warnings;
use Data::Dump;
use XML::Rules;
my $xml_doc = <<EOXML;
<?xml version="1.0" encoding="UTF-8" ?>
<!-- important instructions to manual editors -->
<root>
<group name="blah">
<!-- important instructions for group "blah" -->
<tag/>
</group>
<group name="second">
<!-- important instructions for group "second" -->
<differentTag/>
</group>
</root>
EOXML
my $callback = sub {
my ($name, $parser, @args) = @_;
print STDERR "event:", $name//'<undef>', "(";
print STDERR join ', ', map {defined($_) ? qq("$_") : '<undef>'} @
+args;
print STDERR ")\n";
};
my %handlers = ();
for my $h ( qw/Comment XMLDecl/ ) { #qw/Start End Char Proc Comment Cd
+ataStart CdataEnd Default Unparsed Notation ExternEnt ExternEntFin En
+tity Element Attlist Doctype DoctypeFin XMLDecl/) {
$handlers{$h} = sub { $callback->($h => @_) }
}
my $parser = XML::Rules->new(
stripspaces => 3|4,
rules => [
_default => 'raw',
],
handlers => \%handlers,
);
#dd
my $data = $parser->parse($xml_doc);
print
my $out = $parser->ToXML($data, 0, " ", "") . "\n";
__DATA__
event:XMLDecl("1.0", "UTF-8", <undef>)
event:Comment(" important instructions to manual editors ")
event:Comment(" important instructions for group "blah" ")
event:Comment(" important instructions for group "second" ")
<root>
<group name="blah">
<tag/>
</group>
<group name="second">
<differentTag/>
</group>
</root>
Now that I've got that far, I should be able to get the prolog and comments into the data object (by returning values, instead of just printing messages). But the harder part will be how to get ->ToXML() to do something on the output. I may have to subclass XML::Rules to get additional outputs for my comment and prolog data items -- if anyone has an easier idea than that, feel free to let me know.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.