Doing again some XML after a long time and trying out XML::Twig
That's example code running on a node from HaukeX
I was looking for a more generic way that writing handlers for each tag and found the ->simplify method, which looks good enough for that task. (yeah I know XML::Simple is evil but so seems the monasteries output too ;-p )
use strict; use warnings; use Data::Dump qw/pp dd/; my $data= join "", <DATA>; use XML::Twig; $\="\n"; print "=== HANDLER:\n"; my $twig=XML::Twig->new( twig_handlers => { 'field[@name="doctext"]' => sub { print $_->gi,"Post: ",$_->child_text(0) }, 'author' => sub { print "ID: ", $_->att("id"); print "Name: ", $_->child_trimmed_text(0); }, }, ); $twig->parse($data); print "=== SIMPLIFIED:\n"; $twig=XML::Twig->new(); print pp $twig->parse( $data)->simplify(); __DATA__ <?xml version="1.0" encoding="Windows-1252"?> <node id="11100665" title="Re^5: What does $_ = qq~"$_"~ do? +" created="2019-05-28 16:28:57" updated="2019-05-28 16:28:57"> <type id="11"> note</type> <author id="830549"> haukex</author> <data> <field name="doctext"> <p>More fun facts! I once wrote a script to search a word list f +or words that make valid regexen which convert one valid word into an +other.</p> <c> $ perl -le 'print bangs =~s engender' bands $ perl -le 'print halved =~s avatar' halted $ perl -le 'print stove =~s evener' stone </c> </field> <field name="root_node"> 11100593</field> <field name="parent_node"> 11100640</field> <field name="reputation"> 21</field> </data> </node>
what I don't like are the leading newlines in many content fields, like in content => "\nhaukex"
=== HANDLER: ID: 830549 Name: haukex fieldPost: <p>More fun facts! I once wrote a script to search a word list for wor +ds that make valid regexen which convert one valid word into another. +</p> <c> $ perl -le 'print bangs =~s engender' bands $ perl -le 'print halved =~s avatar' halted $ perl -le 'print stove =~s evener' stone </c> === SIMPLIFIED: { author => { 830549 => { content => "\nhaukex" } }, created => "2019-05-28 16:28:57", data => { field => { doctext => { content => "\n<p>More fun facts! I o +nce wrote a script to search a word list for words that make valid re +gexen which convert one valid word into another.</p>\n<c>\n\$ perl -l +e 'print bangs =~s engender'\nbands\n\$ perl -le 'print halved =~s av +atar'\nhalted\n\$ perl -le 'print stove =~s evener'\nstone\n</c>\n", }, parent_node => { content => "\n11100640" }, reputation => { content => "\n21" }, root_node => { content => "\n11100593" }, }, }, title => "Re^5: What does \$_ = qq~\"\$_\"~ do?", type => { 11 => { content => "\nnote" } }, updated => "2019-05-28 16:28:57", }
I couldn't find an option for ->simplify(%options) to trim the content.
I had to use child_trimmed_text(0) when writing handlers....
Question:
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
FootballPerl is like chess, only without the dice
In reply to XML::Twig and the monasteries XML by LanX
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |