in reply to Stop runaway regex

As LanX had guessed, non-deferred signals can do what you want. It's not that pretty, but it works.

#!/usr/bin/perl use strict; use warnings; $|++; my $s = 0; use POSIX qw(SIGALRM); POSIX::sigaction(SIGALRM, POSIX::SigAction->new( sub { warn "skipping +$s (took too long)\n"; die } )) || die "Error setting SIGALRM handler +: $!\n"; my $str = 'ffvadsvefwdvewrfvt4vketwrhjkbveqwkjhfkghjlfghjkufghjkfhjkfj +kgfghfkhjfkhjgfhjgfhgfhkgfhkgfhkgfhkgfkhjgfkjgfkghjfkhjgfhjgfkhjgfhjk +fk' x 40960; $str .= 'hjkbklklhbjklercvqewrqereqrfqeerv;;;jnrveervnlknrvlerlvnerlnv +elrvnervlkenvlervojubnertvffff;kn;kff;kn;fk;k;;kmnff;knmf;nff;mnkf;;k +;;' x 40960; my $str2 = $str x 8; my $str3 = 'furrfu'; my $re = qr/(f((\w?)(\w*?))?)+/; print time . "\n\n"; for ( $str, $str2, $str3 ) { $s++; my $res; alarm 2; eval { $res = $_ =~ s/$re/ ^_^ /g; }; print "$s made $res\n" unless $@; } print "\n" . time . "\n"; exit;

This puts out something similar to the following, given the system is slow enough to take more than a second on that second string but not on the first (or third -- gosh, let's hope not!).

1401392795 1 made 1310721 skipping 2 (took too long) 3 made 2 1401392798
This is perl 5, version 16, subversion 2 (v5.16.2) built for darwin-th +read-multi-2level (with 3 registered patches, see perl -V for more detail)

Update: Per davido's advice in the thread, I'll point out that the above is tested but not thoroughly so. One might hope that the die() and ending the eval would be enough unrolling of state that no segfaults or other wonkiness would happen when the regular expression engine is interrupted and reinvoked. If not, the forking model does make a lot of sense. One might also put code for handling one regex at a time into a separate script and hand off to that with something like IPC::Open3 which will handle parts of the child management and inter-process communication for you.

Replies are listed 'Best First'.
Re^2: Stop runaway regex
by LanX (Saint) on May 30, 2014 at 15:10 UTC
    FYI:

    Even w/o deferred signals I was able to create segfaults with alarm (5.10).

    Just by putting simple (?{code}) into the regex to catch the alarm-signal.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      I was unable to get a segfault testing the exact code I posted on either 5.10.1 on CentOS or 5,16,2 on OSX. I only tested it with the strings and regex listed in the code, though.

        I hope it's fixed for newer versions, here what I tried for reproduction:

        Just taking the code from alarm and embedding (dummy) code into a long running regex

        Generally I try to avoid alarm...

        use strict; use warnings; my $start; my $diff; my $timeout; sub tst { $timeout=shift; my $str = "a"x10000; $str .= "b"; eval { local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required alarm $timeout; $str =~ /^ (( a* (?{1}) # dummy code )*)* $/x; alarm 0; }; if ($@) { die unless $@ eq "alarm\n"; # propagate unexpected errors die "timed out after $timeout sec :". time(); } else { print "normal"; } } tst($_) for 2;

        output

        Complex regular subexpression recursion limit (32766) exceeded at /hom +e/lanx/B/PL/PM/timeout_regex.pl line 18. Compilation segmentation fault at Fri May 30 17:41:48

        Cheers Rolf

        ( addicted to the Perl Programming Language)