Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery


by mirod (Canon)
on Sep 01, 2000 at 13:35 UTC ( [id://30695]=modulereview: print w/replies, xml ) Need Help??

Item Description: A simple interface to XML documents

Review Synopsis: Very convenient for config file and simple XML files


XML::Simple - Trivial API for reading and writing XML (esp config files)

XML::Simple loads an XML file in memory, in a convenient structure, that can be accessed and updated, then output back.
A number of options allow users to specify how the structure should be built. It can also be cached using Data::Dumper

Why use XML::Simple?

  • XML configuration files, small table, data-oriented XML
  • simple XML processing
  • you don't care much about XML but find it convenient as a standard file format, to replace csv or a home-brewed format

Why NOT use XML::Simple?

  • your XML data is too complex for XML::Simple to deal with:
    - it includes mixed content (<elt>th<is>_</is>_ mixed content</elt>)
    - your documents are too big to fit in memory
    - you are dealing with XML documents
  • you want to use a standard-based module (XML::DOM for example)

Personal notes

I don't use XML::Simple in production but the module seems quite mature, and very convenient for "light" XML: config files, tables, generally data-oriented, shallow XML (the XML tree is not really deep), as opposed to document-oriented XML.

Update: make sure you read the documentation about the forcearray option or you might get bitten by repeated elements being turned into an array (which is OK) _except_ when there is only one of them, in which case they become just a hash value (bad!).
for example this document:

<config dir="/usr/local/etc" log="/usr/local/log"> <user id="user1"> <group>root</group> <group>webadmin</group> </user> <user id="user2"> <group>staff</group> </user> </config>
when loaded with XMLin and not forcearray option becomes
{ 'dir' => '/usr/local/etc', 'log' => '/usr/local/log', 'user' => {'user1' => {'group' => ['root', 'webadmin']}, 'user2' => {'group' => 'staff'} } };
Note the 2 different ways the group elements are processed.

I also found that XML::Simple can be a little dangerous in that it leads to writing XML that is a little too simple. Often when using it I end up with an XML structure that's as shallow as I can possibly make it, which might not be really clean.

Replies are listed 'Best First'.
XML::Simple design decisions
by grantm (Parson) on Nov 09, 2002 at 07:26 UTC

    I know this is an old thread, but it prompted this question in the chatterbox and my response is probably a bit wordy for a chatterbox reply.

    In this node it is mentioned that without forcearray the values of the hash produced by XML::Simple will produce arrayrefs in some cases and scalars in other cases... it was mentioned in the node that it did not seem to be a good design decision. What motivated that decision?

    I'll start (uncharacteristically) by answering the question: simplicity was the motivation.

    I needed an API that made it very easy to work with common forms of XML. For my purposes, the failing of the existing APIs was complexity. Complexity that was born from the need to provide a comprehensive solution which covered all possible cases. I felt that for the common cases, a module could 'guess' what you wanted instead of forcing you to specify in excrutiating detail. Here's a little background...

    One frequently asked question in the XML world is "should I store my data in attributes or nested elements?". For example, the data content of this XML...

    <person> <firstname>Bob</firstname> <surname>Smith</surname> <dob>18-Aug-1972</dob> <hobby>Fishing</hobby> </person>

    ... is equivalent to this XML:

    <person firstname="Bob" surname="Smith" dob="18-Aug-1972" hobby="Fis +hing" />

    Some people prefer the first form and some prefer the second - there is no 'right' answer as long as we assume that there will only ever be one first name, one surname, one date of birth and one hobby. If we list multiple hobbies, then they must be represented as child elements since the rules of XML say an element cannot have two attributes with the same name. So we might end up with something like this:

    <person firstname="Bob" surname="Smith" dob="18-Aug-1972"> <hobby>Fishing</hobby> <hobby>Trainspotting</hobby> </person>

    To some people, this hybrid form is the obvious and sensible solution. To others, it is ugly and inconsistent. I don't really take a position on that argument and neither does XML::Simple. The XML::Simple API makes it just as easy to access data from nested elements as it is from attributes. It achieves this simplicity by applying simple rules to 'guess' what you want. If you understand the rules then you can provide hints (through options) to ensure the guesses always go your way.

    Now to return to our examples, this code

    my $person = XMLin($filename)

    Will read both the first and second XML documents (above) into a structure like this:

    { firstname => "Bob" , surname => "Smith", dob => "18-Aug-1972", hobby => "Fishing", }

    and the third XML document into a structure like this:

    { firstname => "Bob" , surname => "Smith", dob => "18-Aug-1972", hobby => [ "Fishing", "Trainspotting" ] }

    By default, XML::Simple always represents an element as a scalar - unless it encounters more than one of them, in which case the scalar is 'promoted' to an array. Obviously it would be a bad thing for your code to have to check whether an element was a scalar or an arrayref before processing it - so don't do that.

    One approach to achieving more consistency is to use the 'forcearray' option like this:

    my $person = XMLin($filename, forcearray => 1)

    which will read the first XML document into a structure like this:

    { firstname => [ "Bob" ], surname => [ "Smith" ], dob => [ "18-Aug-1972" ], hobby => [ "Fishing" ], }

    and the third XML document into a structure like this:

    { firstname => "Bob", surname => "Smith", dob => "18-Aug-1972", hobby => [ "Fishing", "Trainspotting" ], }

    But a better alternative is to enable forcearray only for the elements which might occur multiple times (ie: influence the guessing process):

    my $person = XMLin($filename, forcearray => [ 'hobby' ])

    which will consistently read any of the example forms into this type of structure regardless of whether there is only one hobby:

    { firstname => "Bob", surname => "Smith", dob => "18-Aug-1972", hobby => [ "Fishing", "Trainspotting ], }

    Given the three possible values for the forcearray option ...

    1. 0 (always 'guess')
    2. 1 (always represent child elements as arrayrefs - even if there's only one)
    3. a list of element names (force named elements to arrayrefs, guess for all others)

    ... you might well ask why I chose the first option. The truth is that I don't know. The third option is clearly the best for most people, but I couldn't use it as the default since I couldn't know in advance what elements people would want to name. The fact that I chose the worse of the two remaining options hopefully means that a few more people have read the documentation and realised option three is the one they want.

    The observant reader will have noted that I said I couldn't use a list of element names as a default for the 'forcearray' option and yet that is precisely what I chose to use as the default value for the 'keyattr' option. I could quote Oscar Wilde at this point ("Consistency is the last resort of the unimaginative") but the truth is, I didn't think people would think to go looking for the 'array folding' feature so I put it somewhere where they could trip over it.

      I have a problem with array folding that I could only solve by subclassing XML::Simple and hardcoding a return in the array_to_hash method.
      sub array_to_hash { . . . # Or assume keyattr => [ .... ] else { ELEMENT: for($i = 0; $i < @$arrayref; $i++) { return ($arrayref) if $arrayref->[$i]{name} eq 'e_im_dev_io_entry'; + #this line was added to jump out return($arrayref) unless(UNIVERSAL::isa($arrayref->[$i], 'HASH')); . . . }
      If an attribute called "name" has the same value in multiple nested elements, then only one attribute will remain after the array folding. This example is only a part of a larger xml file. I don't want to use KeyAttr=>[] which does prevent the folding, since in other parts of the file, the array folding is desirable. I only want to prevent array folding if the attribute value is equal to "something". I have tried many options with no success. Am I missing something, or is subclassing the only way?
      <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <Report address="Address" name="IM Report" productID="INTRFC-MGR01"> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 2</Text> </Entry> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 5</Text> </Entry> </Report> _______________________________________________ Options used were: none No good, I lost one element $VAR1 = { 'Entry' => { 'e_im_dev_io_entry' => { 'detail' => '4', 'Text' => 'Device Handle: +5' } }, 'name' => 'IM Report', 'address' => 'Address', 'productID' => 'INTRFC-MGR01' }; _______________________________________________ Options used were: KeyAttr=>[] This is what I want. $VAR1 = { 'Entry' => [ { 'detail' => '4', 'name' => 'e_im_dev_io_entry', 'Text' => 'Device Handle: 2' }, { 'detail' => '4', 'name' => 'e_im_dev_io_entry', 'Text' => 'Device Handle: 5' } ], 'name' => 'IM Report', 'address' => 'Address', 'productID' => 'INTRFC-MGR01' };
      I would like the output to be identical to the last example. I don't want to lose any elements. Again, this is just a small piece of a much larger xml document. There are lots of Entry elements so I can't use KeyAttr {...}

        You could also use XML::Rules instead of XML::Simple as it gives you more detailed control over what data structure gets generated.

        Something like:

        use XML::Rules; # at least 0.22 (for the stripspaces) # see my $parser = XML::Rules->new( rules => [ Text => 'content', Entry => 'as array', Report => 'pass', Other => sub {return delete($_[1]->{name}) => $_[1]}, ], stripspaces => 3, ); my $data = $parser->parse(\*DATA); use Data::Dumper; print Dumper($data); __DATA__ <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <Report address="Address" name="IM Report" productID="INTRFC-MGR01"> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 2</Text> </Entry> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 5</Text> </Entry> <Other detail="4" name="first"> <Text>Device Handle: 5</Text> </Other> <Other detail="4" name="second"> <Text>Device Handle: 5</Text> </Other> </Report>

        It doesn't try to guess as XML::Simple does so it's more work though. (In not yet released 0.23 the rule for the Other tag will be just Other => 'by name',.)

RE: XML::Simple
by Anonymous Monk on Oct 30, 2000 at 14:58 UTC
    I want a lightweight module to do easy stuff with config files, i read the above and tried to install LWP::Simple However, LWP::Simple depends on XML::Parser and Storable. I have Storable of course but XML::Parser needs a C library called "expat" which is not available as a stable debian package Ok ok sourgeforge is nice and I don't really mind building stuff from tar balls but "Simple"?? Not really

      XML::Simple (and not LWP::Simple) is simple to use. As for installing it, I don't see what the problem is with typing the usual make/make test/su/make install mantra.

      I am afraid that if you want to process XML in Perl you will have to install XML::Parser, including the expat library. Unix users usually have gcc around and XML::Parser comes pre-installed with the Activestates port on windows. So it really shouldn't be a problem.

      The only real problem that you might come accross is actually that you might have too many versions of expat installed, as some of the Apache tools come with their own, dynamically linked and slightly incompatible version of the library. See the XML::Parser review for more details. By the way a team including Clark Cooper, the maintainer for XML::Parser, Apache people and even (gasp!) Python developers (Python XML tools are also based on expat) is working on the problem.

        expat can be a pain.

        It does not compile on Tru64 (Digital^WCompaq^WHP Unix on alpha), so effectively you cannot process XML with perl on Tru64.

        Update: On Tru64 gcc is not recommended, the vendor delivers a C-compiler that is better than gcc. Perl is almost invariably compiled with the vendor suplied compiler (which is a bit pickier than gcc in adhering to standards and error checking). Thus some^Wtoo many OS projects are unavailable on Tru64 - but this is getting a bit OT.

        XML::Parser 2.29 and earlier were supplied with a version of expat that compiles almost as many places as Perl, I have hung on to that version ;-)

      I know I'm replying to a post made over a year ago, but I want to point out that for debian users, there is now an easy way to install XML::Simple. Just do
      apt-get install libxml-simple-perl
      and you don't have to mess around with expat.

      Anyone tried building expat under cygwin on an NT machine? I've given up.

      If you really don't want to install XML::Parser, then look at some of the other config modules. I like Config::IniFiles
Re: XML::Simple
by vbrtrmn (Pilgrim) on Nov 06, 2002 at 23:25 UTC

    I recently ran into a few problems installing XML::Simple on my linux box (mandrake 8.2). You currently can't install XML::Simple without XML::Parser.

    Anyway, here's what you'll need to get it done:

    1. Install Expat XML Parser, download here:
      I advise installing from the source rather than using the RPM, as it didn't work for me.
    2. Install XML::Parser
      XML::Simple currently doesn't work with out XML::Parser. I just installed through the CPAN shell, no problems after installing expat.
    3. Install XML::Simple
      Also, installed easily through the CPAN shell.

    Without installing expat, I could not install XML::(anything).

    Updated Jan 31, 2003

    I recently rebuilt the box with Mandrake 9.0, it was 8.0. Have the same problems with expat and XML::Simple & XML::Parser not installing with the expat RPMS.
    I tried both the current release of expat (1.95.6) and the one from the Mandrake ISO (1.95.2); with NEITHER of the RPMS, will either of the mentioned XML modules install properly. Though if I install the (1.95.6) from source, both modules install fine.

    Not sure if it is a Mandrake thing or what. I'm pretty sure I haven't been smoking too much crack though.


      I'm not sure why you had RPM problems, but the first place to look for an RPM of expat is on the Mandrake CD. I'm not a mandrake user, but expat is a standard feature of RedHat.

      If you don't want to use XML::Parser then XML::Simple version 1.08_01 or later can work with *any* SAX parser. Just install XML::SAX and then say XML::LibXML.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: modulereview [id://30695]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-06-15 14:41 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.