in reply to Test RegEx Validity

I like simonm's and especially Fletch's tricks. I'll also offer that it's possible to write a colossally bad regular expression that might take hours or decades to match; intentionally or accidentally. So if this isn't a personal box project, you should probably wrap the actual execution of the expression in a time out alarm; something like so (mostly from the Cookbook):

eval { local $SIG{ALRM} = sub { die "Bad rx timed out" }; alarm 5; # start the timer eval { # the user's rx is run here... }; alarm 0; # clear the timer }; alarm 0; # in case something else went wrong die $@ if $@ and $@ !~ /rx timed out/;

Replies are listed 'Best First'.
Re^2: Test RegEx Validity
by BUU (Prior) on Nov 28, 2004 at 02:32 UTC
    At least on recent version of perl, alarm won't time out a regex call. Your only real choice is to use one of the regexp::parsers and create your own regex that has a call to a timeout sub after every op.
      This regex timeout issue, in a slightly different context, is a problem I am having as well. I appreciate the light being shed by all the sages, even though my humble novice mind is still struggling in the swamp of Confusion.

      My situation is that:

      1) I didn't really understand BUU's last post.

      2) I am intimidated by regex::parser

      3) I am looking for a general solution for the issue of what to do when sometimes a program takes too long for whatever reason and you need to give up and forget it.

      Rather than slink away into the shadows, I followed up BUU's breadcrumbs with google searches

      http://www.google.de/search?hl=de&q=perl+%22timeout+sub%22&btnG=Google-Suche&meta=

      and

      http://www.google.de/search?hl=de&q=perl+%22unreliable+program%22&btnG=Suche&meta=

      which brought me back to perlmonks:

      http://perlmonks.thepen.com/220132.html

      "Using Expect.PM to manage an unreliable program."

      Is this a corridor that we novices who wish to manage regex timeout problems would do well to explore... or will it only confuse us more?

      Humbly,

      Thomas.

        From perl58delta:
        Perl used to be fragile in that signals arriving at inopportune moments could corrupt Perl's internal state. Now Perl postpones handling of signals until it's safe (between opcodes). This change may have surprising side effects because signals no longer interrupt Perl instantly. Perl will now first finish whatever it was doing, like finishing an internal operation (like sort()) or an external operation (like an I/O operation), and only then look at any arrived signals (and before starting the next operation). No more corrupt internal state since the current operation is always finished first, but the signal may take more time to get heard. Note that breaking out from potentially blocking operations should still work, though.
        The key sentence there is "Now Perl postpones handling of signals until it's safe (between opcodes).". Which means that any single opcode operation, such as a regexp match, can't be intterrupted by a signal. Which is what alarm is after all, it just tells the kernel to send a sigalrm after a certain amount of time. The effective point of all this is that alarm won't interrupt regexen that will run forever, if you are using a perl later than 5.8 (which you really should). Unless of course you're usingg perl 5.9. Which might or might not interrupt. But you probably shouldn't be using perl5.9, so thats a moot point.

        Anyways, on to actually answering your question, no, you can't really use expect for this. Expect is for managing *external* programs, and so, unless you want to run your regexp in an external script, expect won't help. Of course, you could run it in an external script and attempt to time it out that way, but you don't need Expect for that anyways.

        As for the general problem of "I am looking for a general solution for the issue of what to do when sometimes a program takes too long for whatever reason and you need to give up and forget it.", if the "program" in question is external to your script, timing it and killing it is a rather simple matter, but as of perl5.8, ensuring that a perl script will only run for a certain amount of time, from the inside, as it were, is rather difficult. The only thing I can think of off hand, if you don't want to mess with Regexp::Parser (which is admittedly scary), you might try PerlInterp, which embeds a perl interpreter inside your perl script. =]. You might be able to time out that with an alarm. Then again, maybe not.