aplonis has asked for the wisdom of the Perl Monks concerning the following question:

I've got a Perl/Tk script that let's users type in any old RegEx into a text widget. That RegEx is then used as a file filter. I can't control what they type in. So I need to test their RegEx for validity before it's employed.

That is, I want a sub that returns a 1 when the user's RegEx is valid, 0 otherwise. Like so:

sub re_valid_test {
   my ($re) = @_;
   my %bool = 0;

   ...some way of setting $bool to 1 if $re is valid...

   return $bool;
}

Perl already knows how to fault an invalid RegEx. How do I tap into that?

Replies are listed 'Best First'.
Re: Test RegEx Validity
by Your Mother (Archbishop) on Nov 28, 2004 at 01:41 UTC

    I like simonm's and especially Fletch's tricks. I'll also offer that it's possible to write a colossally bad regular expression that might take hours or decades to match; intentionally or accidentally. So if this isn't a personal box project, you should probably wrap the actual execution of the expression in a time out alarm; something like so (mostly from the Cookbook):

    eval { local $SIG{ALRM} = sub { die "Bad rx timed out" }; alarm 5; # start the timer eval { # the user's rx is run here... }; alarm 0; # clear the timer }; alarm 0; # in case something else went wrong die $@ if $@ and $@ !~ /rx timed out/;
      At least on recent version of perl, alarm won't time out a regex call. Your only real choice is to use one of the regexp::parsers and create your own regex that has a call to a timeout sub after every op.
        This regex timeout issue, in a slightly different context, is a problem I am having as well. I appreciate the light being shed by all the sages, even though my humble novice mind is still struggling in the swamp of Confusion.

        My situation is that:

        1) I didn't really understand BUU's last post.

        2) I am intimidated by regex::parser

        3) I am looking for a general solution for the issue of what to do when sometimes a program takes too long for whatever reason and you need to give up and forget it.

        Rather than slink away into the shadows, I followed up BUU's breadcrumbs with google searches

        http://www.google.de/search?hl=de&q=perl+%22timeout+sub%22&btnG=Google-Suche&meta=

        and

        http://www.google.de/search?hl=de&q=perl+%22unreliable+program%22&btnG=Suche&meta=

        which brought me back to perlmonks:

        http://perlmonks.thepen.com/220132.html

        "Using Expect.PM to manage an unreliable program."

        Is this a corridor that we novices who wish to manage regex timeout problems would do well to explore... or will it only confuse us more?

        Humbly,

        Thomas.

Re: Test RegEx Validity
by Zaxo (Archbishop) on Nov 28, 2004 at 01:58 UTC

    See warnings below, but you can test validity of a string as a regex without running a match on it. You can do that by eval-ing the qr// operator,

    sub re_valid_test { my $re = eval { qr/$_[0]/ }; defined($re) ? 1 : 0 ; }
    If you're willing to accept undef instead of zero as false, you don't need the trinary. Just leave the last line as defined $re;

    Your less amiable users may enjoy handing you regexen containing containing (?{...}) and (??{...}) constructs. You might not enjoy it so much. You give users the ability to execute arbitrary perl code. Of course for shell users that is the case anyway, but this would be very bad in a suEXEC'd web app.

    You need to test ought-to-be-tainted input for safety as well as validity. That is not easy. You need a parser for regular expressions. That will test validity as well as giving you a chance to reject code constructs.

    After Compline,
    Zaxo

      Perl's native boolean false, as for example returned by defined, is never undef. Instead it is the dualvar value (numeric: 0, string: "").

      You can construct your own dualvar values using the function dualvar() from Scalar::Util. Fun for the whole family! :)

        Simpler than Scalar::Util's dualvar() would be to use the expression: !1.

Re: Test RegEx Validity
by simonm (Vicar) on Nov 28, 2004 at 01:12 UTC
    Use eval to catch the fault:
    eval { "" =~ /$re/; $bool = 1 };

      There's no need to actually use it in the eval, and you can save off the result and use it after the check.

      sub validate_re { my $re = shift; eval { $re = qr/$re/ }; return $@ ? undef : $re }

      That will return either an already compiled regexp or undef (and $@ will indicate what was wrong in that case).

Re: Test RegEx Validity
by Prior Nacre V (Hermit) on Nov 29, 2004 at 04:42 UTC

    That can be dangerous. Here's a valid regex:

    [ ~/tmp ] $ perl -wle 'use strict; my $x = qr(s/.*/system("rm *")/e); +print $x;' (?-xism:s/.*/system("rm *")/e) [ ~/tmp ] $

    You're probably going to need additional regex validation.

    Regards,

    PN5