adrianxw has asked for the wisdom of the Perl Monks concerning the following question:

Good day good monks. At the end of October, I was kindly helped, (Using the result from XML::XQL::solve), to get a program working. This program has functioned perfectly every 2 hours since.

I had a second need, basically doing the same job as the first, but retrieving the XML from a different site.

I retrieve and decompress the XML without incident, however, when this new version of the program tries to process the XML, the program crashes.

The line that crashes the program is this...
$usernodeset = $xpu->find("/users/user[teamid=$CONFIG{teamid}]");
... which is identical to the line in the working script.

When the script gets to that line, the program screen I/O stops but the task is still running. Perl.exe is using 40-50% of the CPU, and I can see the pagefile usage rising incrementally. After 5-6 minutes, I get an Application Error message box which says...

The instruction at "0x28089a3d" referenced memory at "0x00000004". The memory could not be "written".

... clicking okay the console prompt returns.

I have visually checked the XML and it seems perfectly valid, and in the same format as that obtained from the other site. What is different is the size. The working program is processing a ~9MB XML buffer, that which is failing, a ~32MB buffer.

I notice the page file usage initially rises in a large jump, but then in slow increments until the task seems to have reached about 2GB which is perhaps significant.

The documentation does not apply a size limit to the XML to be searched.

Have I hit a limit or is this message indicative of something that I am missing?

*** UPDATE ***

Following the suggestions below, I have suceeded in getting both a Rules version and a Twig version to work. I have settled on the Twig version as it seems to run a little faster.

The Twig parser also crashes if I do not use the purge command in the twig_handler routine, so there is an undocumented limit there as well.

Replies are listed 'Best First'.
Re: XPath crashing
by Herkum (Parson) on Feb 04, 2007 at 18:24 UTC

    This is an interpreter error than than a script or module error. To be honest there is not much you can do on the programming side to fix it.

    Your options boil down too,

    1. There may be bad RAM on the system, highly unlikely but it is a possibility.
    2. The library the interpreter is using. Upgrade the version of Perl or an extreme case downgrade the version of Perl that is installed. Don't really recommend downgrading though unless you have no other choice.
    3. The OS is a problem. Try moving the program to another machine. Try it from another version of Windows or possibly a *NIX system. If you go to a new OS the interpreter may not have this bug at all.
      1. I will, as soon as possible, run Memtst86 to verify the hardware. I agree, this is worthwhile, but unlikely.

      2. I expect my Perl is not the most up-to-date, (not a long way away, but possibly somewhat crusty). I can certainly investigate this. Similarly the module, although I did download the module last autumn, I do not know how frequently this is updated. Again, I can investigate this.

      3. Whilst I may be able to try this on other OS's, it will not solve my problem. The target OS is XP, that is a constraint.

      If this is a size issue, I could possibly chop the XML buffer into a number of smaller buffers by traditional means, then call the processing routine repeatedly each invocation processing say 1000 records.

      I create the object with an xml->abc parameter. Is there a formal destructor or reset method for this object or do I need to create a new object for each inquiry? The documentation at CPAN does not show a destructor.

        Well, another option is to just use another module, I recommend, XML::Twig as another possibility. It can deal with large files very well because you can flush (clear from memory) the parts of the document do not need anymore. This would reduce the size of the document as well.

        Just to update, memtest86 showed the RAM to be fine. The XPath I have is the latest available. My Perl core was slightly out of date, 5.8.7, I have updated to 5.8.8 this evening. The error persists.
Re: XPath crashing
by Jenda (Abbot) on Feb 04, 2007 at 23:32 UTC

    Loading 32MB of XML into a DOM structure is bound to use quite a lot of memory, but it seems the XML::XPath->find() is buggy. In either case you'd be better of using something that doesn't build the whole structure, expecially if as you said in the other node you only need very little out of the large XML. XML::Twig or maybe XML::Rules ;-)

    It's hard to give you an example without knowing the XML, but using XML::Rules you might start with:

    my $parser = XML::Rules->new( rules => [ 'computers,and,other,tags,you do not care about at all' => 'skip', '^user' => sub {return $_[1]->{teamid} eq $CONFIG{teamid} }, 'user' => 'as array', # only the <user> tags containing the right attribute will be proce +ssed '_default' => 'as is', 'tags,with,no,attributes' => 'content', ] ); my $data = $parser->parse($the_xml);
    And you end up with a data structure containing only the content of the <user> tags you are interested in. And at no time is the whole document in memory, the module only keeps the data you wanted and the attributes of the yet unclosed nodes in memory.

      Thanks for the steers, I have been browsing the Twig and Rules documentation today. I will download the packs this evening and experiment with them.

      The xml is very simple...

      <users> <user> 5-6 simple key's here <teamid>100</teamid> </user> </users>
      ... there are many thousands of user tags, but I am only interested in the 100 - 150 records with the correct teamid.

        It's a shame the teamid is a subtag and not an attribute, you will be unable to use the "start" rules. Which will make the script a bit slower, but not more memory hungry. In this case I think

        my $parser = XML::Rules->new( rules => [ _default => 'content', user => sub { return unless $_[1]->{teamid} = $_[4]->{parameters}{teamid}; delete $_[1]->{_content}; # delete the textual context (whitespace) return '@user' => $_[1]; }, users => 'pass no content', ] ); my $users = $parser->parsefile( $the_xml_file, {teamid => 100};
        should give you the data for the right users only.

Re: XPath crashing
by Trizor (Pilgrim) on Feb 05, 2007 at 00:41 UTC

    I don't have access to a system with XML::XPath installed as of this writing, when I get it I will update if necessary, but I believe the problem is that your xpath query might contain a syntax error. If $CONFIG{teamid} is a string, it needs to be in single quotes for a string equals predicate (the part in brackets). XML::XPath will sometimes fail spectacularly when given a bad query.

    If $CONFIG{teamid} is a string the query should look like this:
    $usernodeset = $xpu->find("/users/user[teamid='$CONFIG{teamid}']");
      The teamid value is an integer. Reading the XPath spec, it should not be necessary to quote it. I can try of course!
Re: XPath crashing
by adrianxw (Acolyte) on Feb 12, 2007 at 09:54 UTC
    *** UPDATE ***

    Following the suggestions below, I have suceeded in getting both a Rules version and a Twig version to work. I have settled on the Twig version as it seems to run a little faster.

    The Twig parser also crashes if I do not use the purge command in the twig_handler routine, so there is an undocumented limit there as well.