in reply to Re^2: The best way to handle different type of XML files
in thread The best way to handle different type of XML files

I've found XML::Simple is really useful for prototyping a solution, frequently with either a dumbed-down schema or a subset of a real document. It lets me stub out the XML bit of the program so I can get the rest of the logic flowing.

After that, I usually end up re-writing that code to XML::XPath or its ilk.

I prefer to use XML::Simple when possible, because it takes a lot less care and feeding in the simplest cases. But like you pointed out, it blows up pretty quickly once the document is more than trivial.

Of course, it sounds like the OP has gotten past the prototype stage already. Frankly, it sounds like the project got a lot farther on XML::Simple than I would have expected.

  • Comment on Re^3: The best way to handle different type of XML files

Replies are listed 'Best First'.
Re^4: The best way to handle different type of XML files (Why I don't think much of XML::Simple)
by ikegami (Patriarch) on Nov 21, 2009 at 23:14 UTC

    I don't believe it. Let's compare the parsers by extracting the Person elements from the following very common structure:

    ... <Persons> <Person>...</Person> <Person>...</Person> <Person>...</Person> </Persons> ...

    The Persons element is optional and the number of Person elements is variable.

    • XML::Simple, without specifying a schema:

      my $persons = $parent->{Persons}; my @persons = !$persons ? () : !$persons->{Person} ? () : !ref($persons->{Person}) ? $persons->{Person} : @{ $persons->{Person} };
    • XML::Simple, specifying a schema via ForceArray, etc:

      GroupTags => { Persons => 'Person' }, ForceArray => [qw( Person )], my @persons = $parent->{Persons} ? @{ $parent->{Persons} } : ();
    • XML:::LibXML:

      my @persons = $parent->findnodes('Persons/Person');

    Did I pick an example that XML::Simple handles poorly? Let's do another extremely common example to demonstrate otherwise. Let's extract the person's country.

    ... <Person> ... <Country ...>...</Country> ... </Person> ...
    • XML::Simple, with default settings:

      my $country = !defined($person->{Country}) ? undef : !ref($person->{Country}) ? $person->{Country} : $person->{Country}{content};
    • XML::Simple, with ForceContent => 1:

      my $country = $person->{Country} && $person->{Country}{content}
    • XML:::LibXML:

      my $country = $person->findvalue('Country');

    XML::Simple code is insane without a schema. It's much simpler with, but it's still longer and messier than with XML::LibXML. And it takes a lot of up-front time time to create the schema and lots of headaches from making mistakes.

    With XML::LibXML, I don't have to do any of that up-front extra work XML::Simple requires. so in addition to being a better production parser (simpler, 50x faster, etc), it's a better prototyping parser too.

      XML::Simple code is insane without a schema. It's much simpler with, but it's still longer and messier than with XML::LibXML. And it takes a lot of up-front time time to create the schema and lots of headaches from making mistakes. With XML::LibXML, I don't have to do any of that up-front extra work XML::Simple requires. so in addition to being a better production parser (simpler, 50x faster, etc), it's a better prototyping parser too.

      You got me thinking, and I think you're probably right.

      I'm playing with it now...