in reply to Losing control of large regular expressions

Owing to changes in recent perls (5.8+ I believe), signals no longer interrupt a single opcode's execution. A regex is a single opcode, so the alarm never interrupts it. One solution, as mentioned above, is to use unsafe signals, although I am unsure if it is merely an ENV variable or a compile option. As the name says, these are potentially unsafe as a signal may interrupt an opcode that isn't interruptible and thus crash perl, but this is a very rare case.

Your other option involves using the Regexp::Parser to create a new regex that has embedded time checking functionality (by inserting (?{}) blocks) or by forking a seperate process and using various means (rlimits, etc) to control the length of execution of the process

Note that running user defined regexes is HORRIBLY UNSAFE as the user may embed any perl code he wishes in the regex.
  • Comment on Re: Losing control of large regular expressions

Replies are listed 'Best First'.
Re^2: Losing control of large regular expressions
by grinder (Bishop) on Jan 12, 2005 at 07:18 UTC
    Note that running user defined regexes is HORRIBLY UNSAFE as the user may embed any perl code he wishes in the regex.

    Not true, at least by default. Perl won't let you do that unless you explicitly use re 'eval'. Think of it as tainting for regexps.

    The following script shows this:

    #! /usr/local/bin/perl -w use strict; my $re = shift || '.'; $re = qr/$re/; while( <DATA> ) { print if /$re/; } __DATA__ Owing to changes in recent perls (5.8+ I believe), signals no longer interrupt a single opcode's execution. A regex is a single opcode, so the alarm never interrupts it. One solution, as mentioned above, is to use unsafe signals, although I am unsure if it is merely an ENV variable or a compile option. As the name says, these are potentially unsafe as a signal may interrupt an opcode that isn't interruptible and thus crash perl, but this is a very rare case.

    When run, the above produces the following output:

    % ./extreg '\bs.*ls\b' Owing to changes in recent perls (5.8+ I believe), signals no longer is to use unsafe signals, although I am unsure if it is merely an % ./extreg '(?{system "rm -rf *"})' Eval-group not allowed at runtime, use re 'eval' in regex m/(?{system +"rm -rf *"})/ at ./extreg line 6.

    Perl may be crazy at times, but it is not insane. But yeah, you are right though, it does make me nervous.

    - another intruder with the mooring in the heart of the Perl

      It is true that Perl protects you by default against arbitrary code execution in regular expressions. However, it does not protect you against denial of service, because a regular expression may be crafted not to finish before the heat death of the universe. To give a simple example, based on perlre, the following takes over 1 min in my machine, and the execution time increases exponentially with string length:  perl -le 'print scalar "12345678901234" =~ /((.{0,5}){0,5}){0,5}[\0]/'
Re^2: Losing control of large regular expressions
by scottb (Scribe) on Jan 12, 2005 at 00:41 UTC
    As per my response to borisz, merely setting the ENV variable did not work. If it's not going to be easily portable, I'm on the hunt for better options.

    The second option sounds interesting, but it would definately take some work to determine an algorithm for placing the time checks within unknown regexes. The rlimits approach is a totally new one to me and based on a little searching seems a complex approach... but another thing to try before giving up.

    Rest assured I am aware of the risks of running user defined regexes and am testing for ?{}. It's also not being used in a 'hostile' environment.

    Thanks

      scottb,
      The ability to change the signal behavior using an environment variable depends on the version of Perl. In >= 5.8.1 it works. If you have a Perl that meets that criteria and it is not working then the cause is likely something else. You will also want to make sure it is exported.

      From perldoc perlipc
      If you want the old signal behaviour back regardless of possible memory corruption, set the environment variable "PERL_SIGNALS" to "unsafe" (a new feature since Perl 5.8.1).

      Cheers - L~R

Re^2: Losing control of large regular expressions
by Anonymous Monk on Jan 12, 2005 at 10:26 UTC
    Note that running user defined regexes is HORRIBLY UNSAFE as the user may embed any perl code he wishes in the regex.
    Nope. Not true. That's what Ilya first wanted when he introduced /(?{ })/, but that was quickly shot down by p5p because of its security hazards. Arbitrary code is only executed if either of the following cases is true:
    • /(?{ })/ or /(??{ })/ appears in the source code itself (thus not because of interpolation).
    • use re 'eval'; is in effect.
    Watch:
    $ perl -wle '"" =~ /(?{print "Fooled you!"})/' Fooled you! $ perl -wle 'use re "eval"; my $re = shift; "" =~ /$re/' '(?{print "Fo +oled you!"})' Fooled you! $ perl -wle 'my $re = shift; "" =~ /$re/' '(?{print "Fooled you!"})' Eval-group not allowed at runtime, use re 'eval' in regex m/(?{print " +Fooled you!"})/ at -e line 1.