Don't use XML::Simple!

perl -e "use Data::Dumper; use XML::Rules; print Dumper(XML::Rules::inferRulesFromExample( 'c:\temp\inventors.xml')) prints:

$VAR1 = { 'inventors' => 'no content', 'number' => 'as is', 'inventor' => 'as array no content', 'city,country,name,upper-name' => 'content' };
With rules like this XML::Rules would produce a data structure like this:
{ 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => { '_content' => '1', 'type' => 'integer' }, 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => { '_content' => '2', 'type' => 'integer' }, 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => { '_content' => '3', 'type' => 'integer' }, 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
Now I do not care about the 'type' => 'integer', I'd rather get just the content for the <number> as well, so let's change the rule for the tag to 'content'. This changes the structure to
{ 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => '1', 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => '2', 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => '3', 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
Better, but I can do even better. If I know I want to get the inventors by number I can change the rule for the <inventor> tag to 'by number' and get a hash instead of an array:
{ 'inventors' => { '1' => { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'name' => 'Andy Barth' }, '3' => { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'name' => 'Nigel Drew' }, 'type' => 'array', '2' => { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA', 'name' => 'Daniele Dall\'Acqua' } } };
In which case getting the name of the inventor #1 would be just $data->{inventors}{1}{name}. If the XML contains just the inventors I can get rid of the 'inventors' by changing it's rule to 'pass' and it'd be just $data->{1}{name}. I also do not want the 'type' => 'array' so let's add " remove(type)" to the rule for <inventors>.
use Data::Dumper; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { 'inventors' => 'no content remove(type)', 'inventor' => 'by number', 'number,city,country,name,upper-name' => 'content' } ); my $data = $parser->parsefile('c:\temp\inventors.xml'); #print Dumper($data); print "The 1st inventor was $data->{inventors}{1}{name}\n";

Jenda
Enoch was right!
Enjoy the last years of Rome.


In reply to Re: XML::Simple parsing help by Jenda
in thread XML::Simple parsing help by jhoop

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.