Hi,

I'm using very large and complex regular expressions on very large and complex pieces of data. The regexes work great and are quite fast and efficient when they match. However, when they don't match, they eat up all the free cycles on my high end server until Apache kills them (this is a CGI application). This seems to be directly related to the size of the data and the complexity of the regular expression. It's quite plausable that there's something wrong with the regular expressions being used, but because in this particular circumstance I am allowing users to enter the regular expressionsl, I would like to limit the time and/or cycles of the regex processing, no matter the regex.

I've tried to use an alarm() call wrapped inside an eval{} to act as a timeout, but it seems that because of the problems with the regex itself, the alarm is never recieved and/or handled properly. To summarize, it's used like this:

eval { local $SIG{ALRM} = sub { die('ALARM'); }; alarm($REGEX_TIMEOUT); @return = ($raw =~ /$regex/msgx); alarm(0); } if ($@ =~ /ALARM/) { ... }
So the questions here are obvious: what am I doing wrong that I am creating these conditions for a 'runaway regex'? How can I contain it and limit them to a maximum amount of time or processing? Why doesn't the alarm ever call die()?

TIA,
Scott


In reply to Losing control of large regular expressions by scottb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.