armyk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a variable $format with a regular expression string. Some like "/reg.expr./" (without quotes :-) e.g. /\d+/ for a number. The string is read from a file. How can I check whether this string is a correct reg. expr. (ie. reasonable) or not?

I tried eval() and it look good but I can't find incorrect reg.expr.:

# $format = "/[12345]/"; # I expect strings like this # $format = "/[[12345]//"; # match "[/" :-] # $format = "/aa/[[12345]//"; # match "aa/[/" my $success = 1; my $test = "x"; if ( $format =~ m!^/(.*)/$! ) { # test for "/reg.expr./" format eval { $test =~ m/$1/ }; # test of "user" reg. exp. if ( $@ ) { $success = 0; # $format is incorrect reg. expr. } }
But all my examples look like correct reg. expr.:

Could you find some incorrect reg. expr. for me? Is it possible to use eval() for my check? Thanks.

Replies are listed 'Best First'.
Re: How to identify invalid reg. expr.?
by broquaint (Abbot) on Jun 05, 2002 at 15:35 UTC
    Try something like this
    my $format = "foo (bar] baz"; eval '$regex = qr/$format/;'; die "ack - $@" if $@; __output__ ack - Unmatched ( before HERE mark in regex m/foo ( << HERE bar] baz/ +at (eval 1) line 1.
    Using the string form of eval() will do runtime compilation of the regex and therefore not break the program if it is invalid. Also note the abbreviation of 'regular expression' is commonly known as regex (and it's a lot easier to pronounce too ;-)
    HTH

    _________
    broquaint

      my $format = "foo (bar] baz"; eval '$regex = qr/$format/;'; die "ack - $@" if $@;
      Danger! Danger! If $format contains a slash, that ends the regex and begins Perl again.

      I think you were trying for this:

      my $format = "foo (bar] baz"; eval { qr/$format/ }; die "ack - $@" if $@;
      Which won't matter if it contains slashes. Almost every code that contains string-eval is broken, and almost never necessary. {grin}

      -- Randal L. Schwartz, Perl hacker


      update: Oh duh. Didn't see the single quotes. In which case, it's overkill, but not dangerous as-is, except that it can be misleading as to why you used single quotes and run-time eval-string instead of compile-time eval-block.
Re: How to identify invalid reg. expr.?
by mephit (Scribe) on Jun 05, 2002 at 17:11 UTC
    I have the following code in a CGI thing I've been working on off-and-on:
    my $searchstring = $obj->param('words'); eval { no warnings; "" =~ /$searchstring/}; if ($@) { ... }
    something like "***" entered will raise some flags, as "*" is a quantifier, and must follow something other than another quantifier. A single left paren or an "unclosed" character class (a missing right bracket) will also do it. I'm sure there are other regexen that will "break" this.

    I should point out that I got the idea for using that method from here somewhere, but I can't find the node in question. I also don't know the ins and outs of eval (block vs. string, for example), but this solution works for me. YMMV. HTH.

    --

    There are 10 kinds of people -- those that understand binary, and those that don't.

      **** sound of gongs and security buzzers going off, **** **** red lights flashing, robots running in circles ****

      That code one heck of security vulnerability waiting to happen. What happens when I pass:

      (?{ dump })

      as the contents of the words parameter?

      Of course, you might have protected this functionality from untrusted access, but since I don't know that you have I think a warning is in order.

      -sam

        At my insistence on the P5P mailing list, the inline-execute feature was restricted to prevent this. From perldoc perlre:
        For reasons of security, this construct is for- bidden if the regular expression involves run- time interpolation of variables, unless the per- ilous "use re 'eval'" pragma has been used (see the re manpage), or the variables contain results of "qr//" operator (see the qr/STRING/imosx entry in the perlop manpage). This restriction is because of the wide-spread and remarkably convenient custom of using run- time determined strings as patterns. For exam- ple: $re = <>; chomp $re; $string =~ /$re/; Before Perl knew how to execute interpolated code within a pattern, this operation was com- pletely safe from a security point of view, although it could raise an exception from an illegal pattern. If you turn on the "use re 'eval'", though, it is no longer secure, so you should only do so if you are also using taint checking. Better yet, use the carefully con- strained evaluation within a Safe module. See the perlsec manpage for details about both these mechanisms.
        I forced the issue when Ilya was initially hesitant by saying that I would have a CERT warning prepared against Perl 5.6.0 if this feature went in without the restriction, as it would open up holes worldwide to many naive sites.

        -- Randal L. Schwartz, Perl hacker

        Eep! I hadn't even thought of that. I just ran that in the browser, and got my "invalid regex" warning. After examining that bit, it looks like it *should* dump core, but it doesn't. Maybe something in my system (apache configuration, security configuration, quotas, something like that) is preventing the core from being dumped.

        I just ran the script through the debugger, and it turns out that $@ contains the following:

        104: if ($@) { DB<2> x $@ 0 '/(?{ dump })/: Eval-group not allowed at runtime, use re \'eval\' +at dbsearch.pl line 103. ' DB<3>
        I have no idea what this means. Like I said in my earlier post, I don't know the finer points of using eval, or what's causing it to not dump core. Anyway, how can I make this safer? (I plan to post the entire script for a review one of these years, after I tweak one or two more things, and find a place to host the script.)

        --

        There are 10 kinds of people -- those that understand binary, and those that don't.