XML::Simple

Replies are listed 'Best First'.

XML::Simple design decisions
by grantm (Parson) on Nov 09, 2002 at 07:26 UTC

I know this is an old thread, but it prompted this question in the chatterbox and my response is probably a bit wordy for a chatterbox reply.

In this node it is mentioned that without forcearray the values of the hash produced by XML::Simple will produce arrayrefs in some cases and scalars in other cases... it was mentioned in the node that it did not seem to be a good design decision. What motivated that decision?

I'll start (uncharacteristically) by answering the question: simplicity was the motivation.

I needed an API that made it very easy to work with common forms of XML. For my purposes, the failing of the existing APIs was complexity. Complexity that was born from the need to provide a comprehensive solution which covered all possible cases. I felt that for the common cases, a module could 'guess' what you wanted instead of forcing you to specify in excrutiating detail. Here's a little background...

One frequently asked question in the XML world is "should I store my data in attributes or nested elements?". For example, the data content of this XML...

  <person>
    <firstname>Bob</firstname>
    <surname>Smith</surname>
    <dob>18-Aug-1972</dob>
    <hobby>Fishing</hobby>
  </person>
[download]

... is equivalent to this XML:

  <person firstname="Bob" surname="Smith" dob="18-Aug-1972" hobby="Fis
+hing" />
[download]

Some people prefer the first form and some prefer the second - there is no 'right' answer as long as we assume that there will only ever be one first name, one surname, one date of birth and one hobby. If we list multiple hobbies, then they must be represented as child elements since the rules of XML say an element cannot have two attributes with the same name. So we might end up with something like this:

  <person firstname="Bob" surname="Smith" dob="18-Aug-1972">
    <hobby>Fishing</hobby>
    <hobby>Trainspotting</hobby>
  </person>
[download]

To some people, this hybrid form is the obvious and sensible solution. To others, it is ugly and inconsistent. I don't really take a position on that argument and neither does XML::Simple. The XML::Simple API makes it just as easy to access data from nested elements as it is from attributes. It achieves this simplicity by applying simple rules to 'guess' what you want. If you understand the rules then you can provide hints (through options) to ensure the guesses always go your way.

Now to return to our examples, this code

  my $person = XMLin($filename)
[download]

Will read both the first and second XML documents (above) into a structure like this:

  {
    firstname => "Bob" ,
    surname   => "Smith", 
    dob       => "18-Aug-1972", 
    hobby     => "Fishing",
  }
[download]

and the third XML document into a structure like this:

  {
    firstname => "Bob" ,
    surname   => "Smith", 
    dob       => "18-Aug-1972", 
    hobby     => [ "Fishing", "Trainspotting" ]
  }
[download]

By default, XML::Simple always represents an element as a scalar - unless it encounters more than one of them, in which case the scalar is 'promoted' to an array. Obviously it would be a bad thing for your code to have to check whether an element was a scalar or an arrayref before processing it - so don't do that.

One approach to achieving more consistency is to use the 'forcearray' option like this:

  my $person = XMLin($filename, forcearray => 1)
[download]

which will read the first XML document into a structure like this:

  {
    firstname => [ "Bob" ],
    surname   => [ "Smith" ], 
    dob       => [ "18-Aug-1972" ], 
    hobby     => [ "Fishing" ],
  }
[download]

and the third XML document into a structure like this:

  {
    firstname => "Bob",
    surname   => "Smith", 
    dob       => "18-Aug-1972", 
    hobby     => [ "Fishing", "Trainspotting" ],
  }
[download]

But a better alternative is to enable forcearray only for the elements which might occur multiple times (ie: influence the guessing process):

  my $person = XMLin($filename, forcearray => [ 'hobby' ])
[download]

which will consistently read any of the example forms into this type of structure regardless of whether there is only one hobby:

  {
    firstname => "Bob",
    surname   => "Smith", 
    dob       => "18-Aug-1972", 
    hobby     => [ "Fishing", "Trainspotting ],
  }
[download]

Given the three possible values for the forcearray option ...

0 (always 'guess')
1 (always represent child elements as arrayrefs - even if there's only one)
a list of element names (force named elements to arrayrefs, guess for all others)

... you might well ask why I chose the first option. The truth is that I don't know. The third option is clearly the best for most people, but I couldn't use it as the default since I couldn't know in advance what elements people would want to name. The fact that I chose the worse of the two remaining options hopefully means that a few more people have read the documentation and realised option three is the one they want.

The observant reader will have noted that I said I couldn't use a list of element names as a default for the 'forcearray' option and yet that is precisely what I chose to use as the default value for the 'keyattr' option. I could quote Oscar Wilde at this point ("Consistency is the last resort of the unimaginative") but the truth is, I didn't think people would think to go looking for the 'array folding' feature so I put it somewhere where they could trip over it.

[reply]
[d/l]
[select]

Re: XML::Simple design decisions

by alw (Sexton) on Dec 24, 2007 at 19:52 UTC

sub array_to_hash {
.
.
.

  # Or assume keyattr => [ .... ]

 else {
  ELEMENT: for($i = 0; $i < @$arrayref; $i++)  {
   return ($arrayref) if $arrayref->[$i]{name} eq 'e_im_dev_io_entry';
+   #this line was added to jump out
   return($arrayref) unless(UNIVERSAL::isa($arrayref->[$i], 'HASH'));
.
.
.
}
[download]

 <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Report address="Address" name="IM Report" productID="INTRFC-MGR01">
   <Entry detail="4" name="e_im_dev_io_entry">
     <Text>Device Handle: 2</Text>
   </Entry>
   <Entry detail="4" name="e_im_dev_io_entry">
     <Text>Device Handle: 5</Text>
   </Entry>
</Report>
_______________________________________________
Options used were: none No good, I lost one element

$VAR1 = {
          'Entry' => {
                     'e_im_dev_io_entry' => {
                                            'detail' => '4',
                                            'Text' => 'Device Handle: 
+5'
                                          }
                   },
          'name' => 'IM Report',
          'address' => 'Address',
          'productID' => 'INTRFC-MGR01'
        };
_______________________________________________
Options used were: KeyAttr=>[]  This is what I want.

$VAR1 = {
          'Entry' => [
                     {
                       'detail' => '4',
                       'name' => 'e_im_dev_io_entry',
                       'Text' => 'Device Handle: 2'
                     },
                     {
                       'detail' => '4',
                       'name' => 'e_im_dev_io_entry',
                       'Text' => 'Device Handle: 5'
                     }
                   ],
          'name' => 'IM Report',
          'address' => 'Address',
          'productID' => 'INTRFC-MGR01'
        };
[download]

[reply]
[d/l]
[select]

Re^2: XML::Simple design decisions

by Jenda (Abbot) on Dec 29, 2007 at 00:53 UTC

You could also use XML::Rules instead of XML::Simple as it gives you more detailed control over what data structure gets generated.

Something like:

use XML::Rules;
 # at least 0.22 (for the stripspaces)
 # see http://www.perlmonks.org/?node_id=658971
my $parser = XML::Rules->new(
    rules => [
        Text => 'content',
        Entry => 'as array',
        Report => 'pass',
        Other => sub {return delete($_[1]->{name}) => $_[1]},
    ],
    stripspaces  => 3,
);

my $data = $parser->parse(\*DATA);

use Data::Dumper;
print Dumper($data);

__DATA__
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Report address="Address" name="IM Report" productID="INTRFC-MGR01">
   <Entry detail="4" name="e_im_dev_io_entry">
     <Text>Device Handle: 2</Text>
   </Entry>
   <Entry detail="4" name="e_im_dev_io_entry">
     <Text>Device Handle: 5</Text>
   </Entry>
   <Other detail="4" name="first">
     <Text>Device Handle: 5</Text>
   </Other>
   <Other detail="4" name="second">
     <Text>Device Handle: 5</Text>
   </Other>
</Report>
[download]

It doesn't try to guess as XML::Simple does so it's more work though. (In not yet released 0.23 the rule for the Other tag will be just Other => 'by name',.)

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]
[select]

Re^3: XML::Simple design decisions

by alw (Sexton) on Dec 29, 2007 at 03:33 UTC

Re^2: XML::Simple design decisions

by ysth (Canon) on Dec 25, 2007 at 10:06 UTC

KeyAttr=>[]

--
CollegeGear.com - more than just college gear (though, yes, we have college-branded teddy bears)

[reply]
[d/l]

RE: XML::Simple
by Anonymous Monk on Oct 30, 2000 at 14:58 UTC

I want a lightweight module to do easy stuff with config files, i read the above and tried to install LWP::Simple However, LWP::Simple depends on XML::Parser and Storable. I have Storable of course but XML::Parser needs a C library called "expat" which is not available as a stable debian package Ok ok sourgeforge is nice and I don't really mind building stuff from tar balls but "Simple"?? Not really

[reply]

RE: RE: XML::Simple

by mirod (Canon) on Oct 30, 2000 at 19:03 UTC

XML::Simple (and not LWP::Simple) is simple to use. As for installing it, I don't see what the problem is with typing the usual make/make test/su/make install mantra.

I am afraid that if you want to process XML in Perl you will have to install XML::Parser, including the expat library. Unix users usually have gcc around and XML::Parser comes pre-installed with the Activestates port on windows. So it really shouldn't be a problem.

The only real problem that you might come accross is actually that you might have too many versions of expat installed, as some of the Apache tools come with their own, dynamically linked and slightly incompatible version of the library. See the XML::Parser review for more details. By the way a team including Clark Cooper, the maintainer for XML::Parser, Apache people and even (gasp!) Python developers (Python XML tools are also based on expat) is working on the problem.

[reply]

Re: RE: RE: XML::Simple

by htoug (Deacon) on Oct 02, 2001 at 13:05 UTC

It does not compile on Tru64 (Digital^WCompaq^WHP Unix on alpha), so effectively you cannot process XML with perl on Tru64.

Update: On Tru64 gcc is not recommended, the vendor delivers a C-compiler that is better than gcc. Perl is almost invariably compiled with the vendor suplied compiler (which is a bit pickier than gcc in adhering to standards and error checking). Thus some^Wtoo many OS projects are unavailable on Tru64 - but this is getting a bit OT.

XML::Parser 2.29 and earlier were supplied with a version of expat that compiles almost as many places as Perl, I have hung on to that version ;-)

[reply]

Re: Re: RE: RE: XML::Simple

by mirod (Canon) on Oct 02, 2001 at 13:58 UTC

Re: RE: XML::Simple

by moodster (Hermit) on Oct 23, 2001 at 22:40 UTC

apt-get install libxml-simple-perl

Anyone tried building expat under cygwin on an NT machine? I've given up.

[reply]

Re: RE: XML::Simple

by Anonymous Monk on Mar 14, 2001 at 03:54 UTC

If you really don't want to install XML::Parser, then look at some of the other config modules. I like Config::IniFiles

[reply]

Re: XML::Simple
by vbrtrmn (Pilgrim) on Nov 06, 2002 at 23:25 UTC

I recently ran into a few problems installing XML::Simple on my linux box (mandrake 8.2). You currently can't install XML::Simple without XML::Parser.

Anyway, here's what you'll need to get it done:

Install Expat XML Parser, download here: http://sourceforge.net/project/showfiles.php?group_id=10127
I advise installing from the source rather than using the RPM, as it didn't work for me.
Install XML::Parser
XML::Simple currently doesn't work with out XML::Parser. I just installed through the CPAN shell, no problems after installing expat.
Install XML::Simple
Also, installed easily through the CPAN shell.

Without installing expat, I could not install XML::(anything).

Updated Jan 31, 2003

I recently rebuilt the box with Mandrake 9.0, it was 8.0. Have the same problems with expat and XML::Simple & XML::Parser not installing with the expat RPMS.
I tried both the current release of expat (1.95.6) and the one from the Mandrake ISO (1.95.2); with NEITHER of the RPMS, will either of the mentioned XML modules install properly. Though if I install the (1.95.6) from source, both modules install fine.

Not sure if it is a Mandrake thing or what. I'm pretty sure I haven't been smoking too much crack though.

[reply]

Re: Re: XML::Simple

by grantm (Parson) on Nov 07, 2002 at 00:06 UTC

I'm not sure why you had RPM problems, but the first place to look for an RPM of expat is on the Mandrake CD. I'm not a mandrake user, but expat is a standard feature of RedHat.

If you don't want to use XML::Parser then XML::Simple version 1.08_01 or later can work with *any* SAX parser. Just install XML::SAX and then say XML::LibXML.

[reply]

XML::Simple

Description

Why use XML::Simple?

Why NOT use XML::Simple?

Personal notes


We don't bite newbies here... much
	PerlMonks