XML Module

Sporti69 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: XML Module
by psini (Deacon) on May 31, 2008 at 14:07 UTC

You could start looking at XML::Parser tutorial in the monastery libraty and XML::SAX::Intro on cpan.

Careful with that hash Eugene.

[reply]

Re: XML Module
by Your Mother (Archbishop) on May 31, 2008 at 17:21 UTC

I second what psini says. XML::Simple considered harmful. This is the normal lifecycle with it-

You need an XML module... Let's check CPAN.
Wow, XML::Parser, XML::LibXML, XML::Compile, XML::Twig... Yikes. This could take hours to learn one of these. Oh, hey! XML::Simple.
That worked great!
Oh, but you need it a little different. So just read the XML::Simple docs. It should be simple to change.
...Hours go by...
...Things are thrown...
...Questions are asked at SOPW...
Someone mentions several older threads and trying a search first next time.
You end up getting one of the fully featured modules.
You spend about as much time learning it as you already flushed fighting XML::Simple over whether your data are elements or attributes.
Now you have a real skill with a powerful tool and if you picked something with a DOM standards interface like LibXML, you also just picked up a bunch of transferable skills like hacking JS.

For my own part, I find XML::LibXML, XML::Compile, XML::Twig to be the sweet spots.

[reply]

Re: XML Module
by ides (Deacon) on May 31, 2008 at 15:04 UTC

The easiest way to do that is to use XML::Simple. However, it isn't the fastest module if your XML documents are really large and you only need to work with certain pieces of it. For cases like that I'd suggest using XML::Twig or the previously mentioned XML::Parser.

Hope that helps!

Frank Wiles <frank@revsys.com>
www.revsys.com

[reply]

Re^2: XML Module

by psini (Deacon) on May 31, 2008 at 16:12 UTC

I myself dislike XML::Simple because not only it's not very fast but it has some questionable default behaviours, like the infamous KeyAttr options (in last week only, no less than 3 questions here in SoPW were related to this "feature").

I believe that simple should mean "simple", not "simple if you want to do things my way only".

Careful with that hash Eugene.

[reply]

Re: XML Module
by Jenda (Abbot) on Jun 01, 2008 at 13:39 UTC

Looks like XML::Simple gets a lot of bad press here. I guess it stems from the fact that most people fail to read the docs even if short. Let's see, what are the problems with XML::Simple.

First, the data structure it produces is not always consistent. Eg. for XML like this:

<root>
 <tag>
  <sub>foo</sub>
 </tag>
 <tag>
  <sub>bar</sub>
  <sub>baz</sub>
 </tag>
</root>
[download]

ForceArray=>[qw(list of tags that may be repeated)]

Next problem is that it's a bit too aggressice in trying to help you with transforming

<root>
 <tag>
  <name>foo</name>
  <value>475</value>
 </tag>
 <tag>
  <name>bar</name>
  <value>147</value>
 </tag>
</root>
[download]

{
  'tag' => {
    'bar' => {
      'value' => '147'
    },
    'foo' => {
      'value' => '475'
    }
  }
}
[download]

KeyAttr => []

There is a problem though that has not been adequately handled in XML::Simple yet though. The inconsistency of

<root>
  <tag>content only</tag>
  <tag attr="1">and content</tag>
</root>
[download]

So all you have to do to get a nice, clean, consistent minimal datastructure out of the XML is to set ForceArray, KeyAttr and ForceContent accordingly. Big deal.

Besides you can infer the tags that need the ForceArray and ForceContent from the example XMLs, the DTD or the Schema. I actually already have the inferring from example XMLs for my XML::Rules done and it's trivila to change it to produce the options in the XML::Simple format. The upcomming version of XML::Rules will contain functions that'll for inferring these options from examples and DTDs for both.

P.S.: Sporti69, you may of course consider using my XML::Rules instead, with a little more work it can give you a more streamlined or even filtered and tweaked datastructure and would allow you to process the XML in chunks instead of loading everything into memory first and only then giving you a chance to process anything.

P.P.S.: I did not discuss one "problem" of XML::Simple, it doesn't preserve the order of child tags. When was the last time you needed that when extracting data from a data-oriented XML? That information would just waste memory and possibly complicate the access in such applications. Of course it means that XML::Simple is not suited for document-oriented XML and for modifying XML that's supposed to be used by a more strict application. If you need that, use a different module.

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]
[select]