JackHammer has asked for the wisdom of the Perl Monks concerning the following question:

Just wondered if anybody had a good reason that this block of code should run out of memory. After a couple of hours of adding debug statements to my code I found that the following xml structure will cause an out of memory error when printed, but can be printed with dumper. Any help is greatly appreciated!
#!/usr/bin/perl use XML::Simple; use Data::Dumper; my $xml = qq|<?xml version="1.0" standalone="no"?> <!DOCTYPE document SYSTEM "whocares.dtd"> <document> <QuestionList> <Question> <Id>3</Id> <Text>Are dingos your friend?</Text> <Answer>Satisfactory</Answer> </Question> </QuestionList> <QuestionList> <Question> <Id>11</Id> <Text>Should this run out of memory?</Text> <Answer>No</Answer> </Question> </QuestionList> </document>|; my $xml_ref = XMLin($xml); print $xml_ref->{'QuestionList'}->{'Question'}->[0]->{'Answer'};

Replies are listed 'Best First'.
Re: XML Parsing Out of Memory error
by msemich (Novice) on Feb 28, 2002 at 20:06 UTC
    Like everyone else has noted, the inital entry by JackHammer into the XML::Simple doc is wrong. But the neat thing is that if he only had one QuestionList it would work. Why, you ask? Because that is what you told XML::Simple to do. If there is more than one element by the same name then you get an arrayref, otherwise you get a hash. Sounds like code breaking behaviour to me, I guess XML::Simple is broken like that. Fear not, for those lovely folks that wrote XML::Simple saw that and said: "Let there be a standardization switch in XML::Simple", and then there was. Its just not turned on in a simple Simple doc hash. As far as Perl screeching about being Out of memory!, it's better than "Core dumped" but isn't it the same sort of invalid reference problem? Perl should wretch, but it ought to leave with some parting words, IMHO. But that's for the Perl core hackers, I do well just to make things work at all. What you want is for XML::Simple to to always make the hashrefs point to an arrayref, whether there is one or more items. Instantiate your Simple doc like this and everything will be in a standardized data set.
    my $xml_ref = XMLin($xml, keyattr='');
    So your freshly parsed doc with a single QuestionList looks like this:
    [crash@shadowfax crash]$ ./this.pl $VAR1 = { 'QuestionList' => [ { 'Question' => { 'Text' => 'Are dingos +your friend?', 'Id' => '3', 'Answer' => 'Satisfact +ory' } }, {} ] }; Satisfactory[crash@shadowfax crash]$
    One of the projects I have worked on was a simple cXML system, and I am using XML::Parser::Checker and XML::Simple along with a data mapping hash to mask out the data from the documents we get in. I ran into the same problem trying to get that to work. Hope this clears things up with that.
Re: XML Parsing Out of Memory error
by mirod (Canon) on Feb 28, 2002 at 17:11 UTC

    If you dump $xml_ref you will find that the structure is not exactly what you think it is, xml_ref->{'QuestionList'} is an array ref, and what you want to display is $xml_ref->{'QuestionList'}->[0]->{'Question'}->{'Answer'};.

    And Perl could be more explicit about the error!

      Sorry about the confusion, this is a snippet from a larger xml structure, I should have included more XML for the example. Try the following code instead:
      #!/usr/bin/perl use XML::Simple; use Data::Dumper; my $xml = qq|<?xml version="1.0" standalone="no"?> <!DOCTYPE document SYSTEM "whocares.dtd"> <document> <QuestionList> <Question> <Id>3</Id> <Text>Are dingos your friend?</Text> <Answer>Satisfactory</Answer> </Question> <Question> <Id>3</Id> <Text>Should I have hit preview a second time?</Text> <Answer>Yes</Answer> </Question> </QuestionList> <QuestionList> <Question> <Id>11</Id> <Text>Should this run out of memory?</Text> <Answer>No</Answer> </Question> </QuestionList> </document>|; my $xml_ref = XMLin($xml); print Dumper $xml_ref; print $xml_ref->{'QuestionList'}->{'Question'}->[0]->{'Answer'};
      in the full code I check to see if this is a list of questions or not... But if you try that one out Question indeed is an array ref as my print would suggest.

        Then you are looking for:

        $xml_ref->{'QuestionList'}->[0]->{Question}->[0]->{Answer}

        I have found using the debugger pretty useful in such a case: once the XML had been XMLin'd, I x'd $xml_ref->{'QuestionList'}, saw it was an array, then x'd $xml_ref->{'QuestionList'}->[0], then $xml_ref->{'QuestionList'}->[0]->{Question} and figured out it was also an array... hence the final code. I often use this technique to debug complex data structures.

Re: XML Parsing Out of Memory error
by dash2 (Hermit) on Feb 28, 2002 at 17:23 UTC
    Note that you get the same error from this code.

    # the output from Dumper, s/\$VAR1/\$xml_ref/ $xml_ref ={ 'QuestionList' => [ { 'Question' => { 'Text' => 'Are dingos +your friend?', 'Id' => '3', 'Answer' => 'Satisfact +ory' } }, { 'Question' => { 'Text' => 'Should this + run out of memory?', 'Id' => '11', 'Answer' => 'No' } } ] }; print $xml_ref->{'QuestionList'}->{'Question'}->[0]->{'Answer'};

    Note also, as I just have, that what you really want to do is

    print $xml_ref->{'QuestionList'}->[0]->{'Question'}->{'Answer'};

    I think that may solve your problem. Why Perl runs out of memory when you give it misleading deep nested references is another matter.

    dave hj~

Re: XML Parsing Out of Memory error
by maverick (Curate) on Feb 28, 2002 at 18:04 UTC
    This might actually be a perl autovivify bug. Check this out:
    use strict; my $ref; $ref->{'hash'}->[0]->{'hash2'}->{'hash3'} = 'go go gadget autovivify!' +; print $ref->{'hash'}->{'hash2'}->{'hash3'};
    Produces 'Out of memory!' when ran

    While this:

    use strict; my $ref; $ref->{'hash'}->[0]->{'hash2'} = 'go go gadget autovivify!'; print $ref->{'hash'}->{'hash2'};
    produces 'Bad index while coercing array into hash at t3.pl line 6.'

    I tried a couple of different combinations, the trick to making it barf is to have TWO or more levels of referencing after the one you reference incorrectly.

    This is Perl 5.6.1 on Redhat Linux 7.2

    /\/\averick
    perl -l -e "eval pack('h*','072796e6470272f2c5f2c5166756279636b672');"

      It's actually a pseudohash bug, fixed in 5.7.2 (5.8 to be).

      Remember how pseudohashes work - they're actually arrays where the first entry is supposed to be a hash indicating the position of the fields. What's happening here is that perl is trying to create an enormous array. I can't recall the exact semantics of it, but it's something to do with extending the array to the size given by the length given by the memory location of "0", or something like that. Maybe someone can find the p5p discussion on this, so I don't have to look for it ;-)

Re: XML Parsing Out of Memory error
by lordsuess (Scribe) on Feb 28, 2002 at 18:14 UTC
    I've got rather big problems with perl5.61 with bigger datastructures like hashes of hashes of lists and so on. With perl5.0005_03, such stuff takes about 30% RAM less, and also improves the runtime up to 80%.

    Although this might not really be a solution for you, maybe it could help you until you find a better way.