Looks like XML::Simple gets a lot of bad press here. I guess it stems from the fact that most people fail to read the docs even if short. Let's see, what are the problems with XML::Simple.

First, the data structure it produces is not always consistent. Eg. for XML like this:

<root> <tag> <sub>foo</sub> </tag> <tag> <sub>bar</sub> <sub>baz</sub> </tag> </root>
the <sub> is once converted to a scalar and second time to an array of scalars. Big deal! Here comes ForceArray=>[qw(list of tags that may be repeated)].

Next problem is that it's a bit too aggressice in trying to help you with transforming

<root> <tag> <name>foo</name> <value>475</value> </tag> <tag> <name>bar</name> <value>147</value> </tag> </root>
to
{ 'tag' => { 'bar' => { 'value' => '147' }, 'foo' => { 'value' => '475' } } }
Again, huge deal, READ THE DOCS and set KeyAttr => [] or to whatever list of tags/attributes you do want to fold on.

There is a problem though that has not been adequately handled in XML::Simple yet though. The inconsistency of

<root> <tag>content only</tag> <tag attr="1">and content</tag> </root>
If you have a tag that has only optional attributes and it sometimes has and somethimes doesn't have the attributes it's harder than necessary to find out the content. You have to use ref() to see whether the <tag> produced a scalar or a hashref. There is an option that can force XML::Simple to always produce the hashref, but it applies to all tags, not just those few that it makes sense for. It's not actually that hard to implement so that it supports the same kind of settings as ForceArray. I just did that and will send a patch to the module maintaner shortly.

So all you have to do to get a nice, clean, consistent minimal datastructure out of the XML is to set ForceArray, KeyAttr and ForceContent accordingly. Big deal.

Besides you can infer the tags that need the ForceArray and ForceContent from the example XMLs, the DTD or the Schema. I actually already have the inferring from example XMLs for my XML::Rules done and it's trivila to change it to produce the options in the XML::Simple format. The upcomming version of XML::Rules will contain functions that'll for inferring these options from examples and DTDs for both.

P.S.: Sporti69, you may of course consider using my XML::Rules instead, with a little more work it can give you a more streamlined or even filtered and tweaked datastructure and would allow you to process the XML in chunks instead of loading everything into memory first and only then giving you a chance to process anything.

P.P.S.: I did not discuss one "problem" of XML::Simple, it doesn't preserve the order of child tags. When was the last time you needed that when extracting data from a data-oriented XML? That information would just waste memory and possibly complicate the access in such applications. Of course it means that XML::Simple is not suited for document-oriented XML and for modifying XML that's supposed to be used by a more strict application. If you need that, use a different module.


In reply to Re: XML Module by Jenda
in thread XML Module by Sporti69

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.