cLive ;-) has asked for the wisdom of the Perl Monks concerning the following question:

Oh keepers of the faith...

If I want to use a user input string as a regular expression, what checks should I make? I think I only need to escape / and new line chars.

Obviously, it would be pointless to use quotemeta here :), but are there any other characters that I really should escape before using in eval?

Er, this is the sort of thing I was thinking of:

$input = '...'; # input from user $value = '...'; # String I want to test to see if $input matches # escape s/delimiter/ $input =~ s|/|\/|gs; # strip new lines - not expected in input anyway $input =~ s/\n//gs; my $match; eval q{ $value =~ /($input)/; # try match $match = $1; # assign here because of local scope? }; if ($@ ne '') { # error because invalid reg exp sent } else { if ($match) { # regexp matched } else { # reg exp not matched } }
I *think* this is sound, but was wondering if anyone can drill any holes in it b4 I go any further down this route...

cLive ;-)

PS - perhaps I should wait and do this in Parrot instead? :)

Replies are listed 'Best First'.
Re: Checking user input on dynamic regular expressions
by davorg (Chancellor) on Apr 02, 2001 at 12:35 UTC

    You could always use the qr// operator to compile the regex and check the success of the compilation using eval.

    my $pat = <STDIN>; # random nonsense from a user my $re = eval { qr/$pat/ }; die "Nasty regex!: $@\n" if $@;
    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

Re: Checking user input on dynamic regular expressions
by Anonymous Monk on Apr 02, 2001 at 12:24 UTC
    I don't know why you think you need to escape the / character. You're using the q quoting mechanism, so the $input is not interpolated, thus it is the regex compiler, not the perl parser, that sees the / so if you're worried about a / being used to terminate the regex, that won't happen. That being said, if you are escaping a character, you may also want to escape the \ character, in your example an $input of '\/' would become '\\/' which does not have an escaped /
      oops, good point. Earlier, I was messing with an interpolated version and slipped that in for safety.

      Guess that makes most of my above question irrelevant :)

      ah well

      cLive ;-)

Re: Checking user input on dynamic regular expressions
by chipmunk (Parson) on Apr 02, 2001 at 19:27 UTC
    You shouldn't have to escape anything, which works out well, because the substitution: s|/|\/|gs; is basically a no-op. :) (You'd need an extra backslash in the replacement to make that do what you intended.)

    Just test the regex in an eval to make sure it compiles, as suggested earlier. And do not do the following: use re 'eval'; because then you could end up evalling arbitrary code in the user's regex.

Re: Checking user input on dynamic regular expressions
by Tyke (Pilgrim) on Apr 02, 2001 at 13:04 UTC
    I think you might want to pass $input through the quotemeta function in order to make sure that any regexp special characters are correctly escaped.

    I don't see why it would be pointless...

    Update Ack, of course it's pointless.

    add_to_todo_list(q(Think _BEFORE_ posting));
    Begs pardon...
      $reg_exp = '\d{4}'; $string = 'The year is 2001, the month is April.'; $req_exp = quotemeta $req_exp; $string =~ /($req_exp)/; print $1;
      That doesn't print '2001', because it's trying to match \\d\{4\} instead of \d{4}

      cLive ;-)

Re: Checking user input on dynamic regular expressions
by cLive ;-) (Prior) on Apr 04, 2001 at 03:19 UTC
    Update - in making the code live, I spotted that we need to change eval, otherwise match is set to last $1. ie:
    eval q{ if ($value =~ /($input)/) { # try match $match = $1; # assign here because of local scope? } };

    I'm pretty happy with that now :)

    cLive ;-)