Perl-Sensitive Sunglasses | |
PerlMonks |
Problem timing out XML::LibXML parsingby samtregar (Abbot) |
on Feb 03, 2009 at 19:40 UTC ( [id://741100]=perlquestion: print w/replies, xml ) | Need Help?? |
samtregar has asked for the wisdom of the Perl Monks concerning the following question:
Hello all. I'm using XML::LibXML to parse some HTML. Mostly it's working great - fast and very useful XPath support. My problem is that it's choking on some very bad HTML in a very bad way - it's sitting on the CPU until killed manually. I expected some HTML wouldn't parse, so this isn't such a tragedy. What is a big problem is that my attempt to work around this with alarm() aren't working!
Here's my code:
If I replace the parse call with sleep(20) then it works as expected - the alarm triggers and the timeout is caught. If I run it as-is with my sample HTML then it never stops until killed. If you want to play along at home here's the test file: http://sam.tregar.com/libxml-fail.html BEWARE: that's some really bad HTML and it not only breaks XML::LibXML but it also crashed Firefox while I was writing this post the first time! You probably don't want to load it in your browser. I've never had alarm() fail like this. Is there an alternative I can try? Any other ideas about how to handle this? Thanks! -sam UPDATE: perrin reminded me about how safe-signals work in recent perls. That is indeed the problem - setting PERL_SIGNALS=unsafe makes my code DWIM, at the cost of a certain degree of safety. Ideas for alternatives are still welcome of course.
Back to
Seekers of Perl Wisdom
|
|