XPath crashing

adrianxw has asked for the wisdom of the Perl Monks concerning the following question:

Good day good monks. At the end of October, I was kindly helped, (Using the result from XML::XQL::solve), to get a program working. This program has functioned perfectly every 2 hours since.

I had a second need, basically doing the same job as the first, but retrieving the XML from a different site.

I retrieve and decompress the XML without incident, however, when this new version of the program tries to process the XML, the program crashes.

The line that crashes the program is this...

  $usernodeset = $xpu->find("/users/user[teamid=$CONFIG{teamid}]");
[download]

... which is identical to the line in the working script.

When the script gets to that line, the program screen I/O stops but the task is still running. Perl.exe is using 40-50% of the CPU, and I can see the pagefile usage rising incrementally. After 5-6 minutes, I get an Application Error message box which says...

The instruction at "0x28089a3d" referenced memory at "0x00000004". The memory could not be "written".

... clicking okay the console prompt returns.

I have visually checked the XML and it seems perfectly valid, and in the same format as that obtained from the other site. What is different is the size. The working program is processing a ~9MB XML buffer, that which is failing, a ~32MB buffer.

I notice the page file usage initially rises in a large jump, but then in slow increments until the task seems to have reached about 2GB which is perhaps significant.

The documentation does not apply a size limit to the XML to be searched.

Have I hit a limit or is this message indicative of something that I am missing?

*** UPDATE ***

Following the suggestions below, I have suceeded in getting both a Rules version and a Twig version to work. I have settled on the Twig version as it seems to run a little faster.

The Twig parser also crashes if I do not use the purge command in the twig_handler routine, so there is an undocumented limit there as well.

Comment on XPath crashing Download Code

Replies are listed 'Best First'.

Re: XPath crashing
by Herkum (Parson) on Feb 04, 2007 at 18:24 UTC

This is an interpreter error than than a script or module error. To be honest there is not much you can do on the programming side to fix it.

Your options boil down too,

There may be bad RAM on the system, highly unlikely but it is a possibility.
The library the interpreter is using. Upgrade the version of Perl or an extreme case downgrade the version of Perl that is installed. Don't really recommend downgrading though unless you have no other choice.
The OS is a problem. Try moving the program to another machine. Try it from another version of Windows or possibly a *NIX system. If you go to a new OS the interpreter may not have this bug at all.

[reply]

Re^2: XPath crashing

by adrianxw (Acolyte) on Feb 04, 2007 at 20:35 UTC

[reply]

Re^3: XPath crashing

by Herkum (Parson) on Feb 04, 2007 at 22:17 UTC

Well, another option is to just use another module, I recommend, XML::Twig as another possibility. It can deal with large files very well because you can flush (clear from memory) the parts of the document do not need anymore. This would reduce the size of the document as well.

[reply]

Re^4: XPath crashing

by adrianxw (Acolyte) on Feb 05, 2007 at 20:40 UTC

Re^4: XPath crashing

by adrianxw (Acolyte) on Feb 05, 2007 at 20:41 UTC

Re^3: XPath crashing

by adrianxw (Acolyte) on Feb 05, 2007 at 20:38 UTC

Just to update, memtest86 showed the RAM to be fine. The XPath I have is the latest available. My Perl core was slightly out of date, 5.8.7, I have updated to 5.8.8 this evening. The error persists.

[reply]

Re: XPath crashing
by Jenda (Abbot) on Feb 04, 2007 at 23:32 UTC

Loading 32MB of XML into a DOM structure is bound to use quite a lot of memory, but it seems the XML::XPath->find() is buggy. In either case you'd be better of using something that doesn't build the whole structure, expecially if as you said in the other node you only need very little out of the large XML. XML::Twig or maybe XML::Rules ;-)

It's hard to give you an example without knowing the XML, but using XML::Rules you might start with:

my $parser = XML::Rules->new(
 rules => [
  'computers,and,other,tags,you do not care about at all' => 'skip',
  '^user' => sub {return $_[1]->{teamid} eq $CONFIG{teamid} },
  'user' => 'as array',
   # only the <user> tags containing the right attribute will be proce
+ssed
  '_default' => 'as is',
  'tags,with,no,attributes' => 'content',
 ]
);

my $data = $parser->parse($the_xml);
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re^2: XPath crashing

by adrianxw (Acolyte) on Feb 05, 2007 at 20:47 UTC

The xml is very simple...

<users>
 <user>
  5-6 simple key's here
  <teamid>100</teamid>
 </user>
</users>
[download]

[reply]
[d/l]

Re^3: XPath crashing

by Jenda (Abbot) on Feb 05, 2007 at 22:59 UTC

It's a shame the teamid is a subtag and not an attribute, you will be unable to use the "start" rules. Which will make the script a bit slower, but not more memory hungry. In this case I think

my $parser = XML::Rules->new(
 rules => [
  _default => 'content',
  user => sub {
   return unless $_[1]->{teamid} = $_[4]->{parameters}{teamid};

   delete $_[1]->{_content}; # delete the textual context (whitespace)
   return '@user' => $_[1];
  },
  users => 'pass no content',
 ]
);

my $users = $parser->parsefile( $the_xml_file, {teamid => 100};
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re: XPath crashing
by Trizor (Pilgrim) on Feb 05, 2007 at 00:41 UTC

I don't have access to a system with XML::XPath installed as of this writing, when I get it I will update if necessary, but I believe the problem is that your xpath query might contain a syntax error. If $CONFIG{teamid} is a string, it needs to be in single quotes for a string equals predicate (the part in brackets). XML::XPath will sometimes fail spectacularly when given a bad query.

  $usernodeset = $xpu->find("/users/user[teamid='$CONFIG{teamid}']");
[download]

[reply]
[d/l]

Re^2: XPath crashing

by adrianxw (Acolyte) on Feb 05, 2007 at 20:49 UTC

The teamid value is an integer. Reading the XPath spec, it should not be necessary to quote it. I can try of course!

[reply]

Re: XPath crashing
by adrianxw (Acolyte) on Feb 12, 2007 at 09:54 UTC

[reply]