jthomas has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Would somebody please show some light on what's happening here.

I'm trying to parse a sample xml file like below

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <config> <type name="default"> <report>Dummy1</report> </type> <type name="scenario1"> <report>Dummy2</report> </type> </config>

When i run to parse with a sample script like below

use XML::Simple; use Data::Dumper; my $lXMLFile = "$ENV{'PWD'}/xmlsample.xml"; my $Config = XMLin($lXMLFile); print Dumper($Config);

I'm getting proper result

$VAR1 = { 'type' => { 'scenario1' => { 'report' => 'Dummy2' }, 'default' => { 'report' => 'Dummy1' } } };

BUT if my xml contains only one <type> tag

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <config> <type name="default"> <report>Dummy1</report> </type> </config>

"default" is not becoming a key like above instead it comes along with "report" key....See the wrong output below..

$VAR1 = { 'type' => { 'report' => 'Dummy1', 'name' => 'default' } };

I was just wondering why this is behaving differently when we have two tags of <type> and when we have only one tag of <type> Would somebody please help me on this. I would like to have an output like below even if we are having one row. Am i missing something :(

$VAR1 = { 'type' => { 'default' => { 'report' => 'Dummy1' } } };

Cheers and Thanks alot Jins Thomas

Replies are listed 'Best First'.
Re: XML::Simple parsing into a hash wierd behaviour
by ikegami (Patriarch) on Apr 20, 2010 at 05:58 UTC

    XML::Simple is one of the most complicated XML parsers I know, in part because you need to instruct it about the structure of the XML document you are parsing.

    In this case, I think what you need is

    ForceArray => [qw( type )]

    An alternative solution with side-effects:

    ForceArray => 1

      At one point or another you need to know the structure of the XML. You may give some of that info to the parser and obtain a simplified structure or give it none and obtain a very generic structure, most probably containing a lot of information you do not really need. And, in some cases, need to more or less explicitely ignore or strip. Parsing the XML is just the first step, it may be a short or a longer one.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        I don't disagree with the principle, but the devil is in the details.

        • XML::Simple defaults to unsafe behaviour.

        • The whole idea of doing a little work up front to save work later on just doesn't pan out in my experience with XML::Simple. I've already exposed this myth.

        • Simplifying the tree sounds good, but it all it really does is make XML::Simple useable. Alternatives have query mechanisms that allow one to jump around the tree as easily.

        And then there are the limitations of XML::Simple.
        • XML::Simple handles namespace VERY poorly

          • It fails if different documents use different prefixes. (Prefixes are arbitrary.)
          • It fails if different documents use different means of specifying the namespace of a given node. (Prefix vs explicit xmlns vs inherited xmlns)

          This defect can be fixed.

        • It can only handle some XML formats.

          • It can't parse formats where one needs to know the order of differently named nodes.
          • It can't generate XML for formats where the order of differently named nodes is relevant or specified.
          • It can't handle formats that intermix text and element nodes (e.g. XHTML).

          This limitation is intrinsic to the design and cannot be fixed.

      ForceArray does not help here. The issue here: XML::Simple by default takes "name" as the key attribute.

      Peter (Guo) Pei

        Did you try it?

        Without ForceArray => [qw( type )]:

        $VAR1 = { 'type' => { 'scenario1' => { 'report' => 'Dummy2' }, 'default' => { 'report' => 'Dummy1' } } }; $VAR1 = { 'type' => { 'report' => 'Dummy1', 'name' => 'default' } };

        With ForceArray => [qw( type )]:

        $VAR1 = { 'type' => { 'scenario1' => { 'report' => 'Dummy2' }, 'default' => { 'report' => 'Dummy1' } } }; $VAR1 = { 'type' => { 'default' => { 'report' => 'Dummy1' } } };

        That's exactly the output the OP requested.

Re: XML::Simple parsing into a hash wierd behaviour
by PeterPeiGuo (Hermit) on Apr 20, 2010 at 06:00 UTC

    If you take a look at the source code for XML::Simple, you will find this line:

    my @DefKeyAttr     = qw(name key id);

    ,which is a good indication of the reason that XML::Simple behaved the way you observed.

    You can fix the issue by passing in the KeyAttr:

    my $parser = new XML::Simple(KeyAttr=>""); my $Config = $parser->XMLin("a.xml");

    That gives you:

    $VAR1 = { 'type' => [ { 'report' => 'Dummy1', 'name' => 'default' }, { 'report' => 'Dummy2', 'name' => 'scenario1' } ] };

    Peter (Guo) Pei

      That's an invalid value for KeyAttr. You need to use KeyAttr=>[] or KeyAttr=>{} to override the default.

      That said, your code doesn't produce the desired output even with this fix.

        Try to run it first. I attached the result above. This is more consistent with the output when there is only one type tag. I re-read the OP, it's true that he would like to keep name as the key. Missed that.

        Peter (Guo) Pei

          A reply falls below the community's threshold of quality. You may see it by logging in.
      Hi, With ForceArray it works. Thanks But with KeyAttr initial try says it doesnt work, even i changed the keyword "name" to something else that time also KeyAttr changes didnt help. Am i missing something

        No, you didn't miss anything. I missed one sentence in your OP, that you wanted name as the key - you said that you didn't like the output from the single element case.

        My focus was on making the multi-element case to produce a more consistent result as the single element case ;-)

        Hope this puts both of us on the same page.

        Peter (Guo) Pei