ManFromNeptune has asked for the wisdom of the Perl Monks concerning the following question:

In the following code (web CGI app), I enable users to define some text and a search+replace regex that they wish to run. It's important to be able to include backreferences in the replacement text. I've been able to accomplish this with the use of eval(), but unfortunately it is a performance problem. I also tried using the /e (evaluation) switch, but that did not permit variable interpretation in the replacement text.

So my big question is: how do I do this *without* eval()?

Here's my code sample:
$user_defined_string = "abcabcabc"; $user_defined_search = '(a)'; $user_defined_replace = '---$1---'; print "before: $user_defined_string\n"; eval("\$user_defined_string =~ s/$user_defined_search/$user_defined_re +place/gs;"); print "after: $user_defined_string\n";
Note that $user_defined_string, $user_defined_search, and $user_defined_replace change for each invocation.

thanks!
-Nept

Replies are listed 'Best First'.
Re: How to do regex backreferences within $variable replacement text?
by Zaxo (Archbishop) on Sep 17, 2005 at 19:30 UTC

    This is a dangerous application to put on the web. You give the user an opportunity to run arbitrary code in (?{...}) or (??{...}) constructs in the regex. With the /e switch, arbitrary code can also be run in the replacement string.

    After Compline,
    Zaxo

      re: Security, that's half of the reason why I would prefer not to use eval() at all (the other half is performance reasons, since eval() is recompiled each time its executed at runtime.)
        Compilation time is going to be there no matter what solution you use; something needs to figure out which characters are plain and which are part of the name of a variable to embed, and something much actually do the embedding. True, not some compilers are faster than others, but I suspect that perl is very quick at compiling a string literal, especially since it's already loaded in memory.

      Perl prohibits runtime compilation of regexps that use those features unless use re 'eval' has also been used.

Re: How to do regex backreferences within $variable replacement text?
by Tanktalus (Canon) on Sep 17, 2005 at 21:41 UTC

    Here's some example code on how to do this with a few tests. I'd invite adding some more tests and any bug reports/security issues as I've not really thought about this from a security perspective yet. It's a bit slower in that it matches twice, but it completely avoids any eval.

    use strict; use warnings; sub substitute { my ($string, $from, $to) = @_; $from = qr/$from/ unless ref $from and ref $from eq 'Regexp'; my @a = $string =~ $from; $to =~ s/\$(\d+)/$a[$1-1]/g; # was $to =~ s/\$(\d+)/\Q$a[$1-1]/g; $string =~ s/$from/$to/; $string; } my @tests = ( [ "this is some test", "(is) s(o)me", '$1 n$2t a' ], [ "this is some test", "is some", 'is not a' ], ); for my $t (@tests) { print "[$t->[0]]..."; print "[",substitute(@$t), "]\n"; }
    prints out:
    [this is some test]...[this is not a test] [this is some test]...[this is not a test]
    which is what I expected. But, as you can see, it's not a very extensive test, so feel free to try a few more.

    Update: It turns out that the \Q in the $to replacement wasn't needed.

      Ok, this appears to be working, except for one thing: for the backreferenced sections, spaces are getting prepended with a backslash in the $to clause, and subsequently in the $string. Here's a test I added:
      [ 'Once upon a time, Jack Roush was not the king of the NASCAR garage, + but a stock-car outsider from Michigan trying to start a Winston Cup + team with a small-time budget.', '(king of the NASCAR )(garage, but +a stock-car)', 'HERE1$1HERE2$2HERE3' ]
      And here's the output:
      [Once upon a time, Jack Roush was not the king of the NASCAR garage, b +ut a stock-car outsider from Michigan trying to start a Winston Cup t +eam with a small-time budget.] ... [Once upon a time, Jack Roush was not the HERE1king\ of\ the\ NASCAR\ +HERE2garage\,\ but\ a\ stock\-carHERE3 outsider from Michigan trying +to start a Winston Cup team with a small-time budget.]

      Any ideas? I can't see where these backslashes are coming from...!

      When allowing the emdedding of code (loosely defined), provide an escape mechanism!!! For example,

      • I have no means of replacing with $1 . "00". Perl uses "${1}00".
      • I have no means of replacing with '$1.00'. Perl uses the literal "\$1.00".

      When adding your escape mechanism, careful not to break existing functionality. For example,

      • Continue allowing me to replace with '\$1'. Perl uses "\\$1".

      Update: A solution is to replace

      $to =~ s/\$(\d+)/$a[$1-1]/g;

      with

      $to =~ s/\\(.)|\${(\d+)})|\$(\d+)/ (defined $1 ? $1 : (defined $2 ? $a[$2-1] : $a[$3-1] ) ) /eg;
Re: How to do regex backreferences within $variable replacement text?
by GrandFather (Saint) on Sep 17, 2005 at 19:11 UTC

    Use the evaluate switch:

    use warnings; use strict; my $user_defined_string = "abcabcabc"; my $user_defined_search = '(a)';
    my $user_defined_replace = '---$1---';
    my $user_defined_replace = '"---".$1."---"'; print "before: $user_defined_string\n"; $user_defined_string =~ s/$user_defined_search/$user_defined_replace/e +e; print "after: $user_defined_string\n";

    prints:

    before: abcabcabc after: ---a---bcabcabc
    Update: Fix the $user_defined_replace string

    BTW: you are aware that your user can execute pretty much any code using this technique?. You may want to do some aggressive filtering on the expressions that are allowed, and that may be pretty tricky to do!


    Perl is Huffman encoded by design.
      Hmm... that didn't work either... it prints:

      before: abcabcabc
      after: "---".$1."---"bcabcabc


      And re: security issues around executing any code, this is another reason I was hoping to avoid eval() or any of its close relatives!

      As another possible idea, is there a way to precompile the replacement text of a regular expression, sort of like what qr// does for you with the search portion?
        GrandFather missed another e. GrandFather fixed it

        $user_defined_string =~ s/$user_defined_search/$user_defined_replace/e +eg; __END__ before: abcabcabc after: ---a---bc---a---bc---a---bc

        Now back to your security issue, here is a simple thing to do as a replacement and you will get the username. In otherwords it is really dangerous as pointed out by Zaxo and GrandFather

        my $user_defined_replace = '`whoami`'; before: abcabcabc after: xxx bcxxx bcxxx bc

        Note: in the above xxx stands for the username

        Update: I might be wrong but I cannot see a nice way to handle user definied substitutions... If you give them control to becomoe part of your script (i.e. they give some code to be executed inside your script) then they can do whatever they want... A better would be to look through the string they send you and check for potentially harmful substitutions like backticks and other operators and then not execute if present.

        Sorry, coffe effect still applies: it needs two eval switches (now updated).

        You can't do it without evaluation in some form. You could parse the replaced string for $n's and then replace those with their respective captured text. I'll post something in a while


        Perl is Huffman encoded by design.
        DOH, just realized that you had "/ee" ... tried that and it did indeed work :) But this is still basically an eval(), right?
      I tried that, unfortunately it didn't work. I got:

      before: abcabcabc
      after: ---$1---bc---$1---bc---$1---bc

      The "$1" is getting interpreted literally, not as a backreference.
Re: How to do regex backreferences within $variable replacement text?
by GrandFather (Saint) on Sep 18, 2005 at 00:48 UTC

    Finally got a moment away from child minding :). Here's a non-eval technique:

    use warnings; use strict; my $udStr = "abcabcabc"; my $udSearch = '(a)'; my $udRep = '---$1---'; print "before: $udStr\n"; my $before = $udStr; $udStr =~ s/$udSearch/$udRep/; my @starts = @-; my @ends = @+; for (1..$#starts) { my $replace = substr $before, $starts[$_], $ends[$_] - $starts[$_]; $udStr =~ s/\$$_(?=\D)/$replace/; } print "after: $udStr\n";

    Prints:

    before: abcabcabc after: ---a---bcabcabc

    This still doesn't fix (?{...}) and (??{...}) in $user_defined_search, but those could be filtered.


    Perl is Huffman encoded by design.

      One can also add a 'g' to lines 10 and 17 to replace all ocurrences of $udSearch.

      --
      David Serrano

        Yes, I intended to :(. Must be in Sunday mode.


        Perl is Huffman encoded by design.

      Before seeing your code, I had tried something similar but limited only to $1. Now I've worked on it a bit more and came up with this. Just another WTDI.

      use warnings; use strict; my $user_defined_string = "There's more than one way to do it (more th +an one)."; my $user_defined_search = '(more)(.*?)(one)'; my $user_defined_replace = '<b>$1</b>$2<b>$3</b>'; my (@subs) = $user_defined_string =~ /$user_defined_search/; for my $sub (1..@subs) { $user_defined_replace =~ s/\$$sub/$subs[$sub-1]/ge; } print "mangled replace: $user_defined_replace\n"; $user_defined_string =~ s/$user_defined_search/$user_defined_replace/g +e; print "after: $user_defined_string\n"; __OUTPUT__ mangled replace: <b>more</b> than <b>one</b> after: There's <b>more</b> than <b>one</b> way to do it (<b>more</b> t +han <b>one</b>).

      As we can see, the user is expected to have a deep understanding of Perl regexes (non-greediness in this example) if she wants to do fancy stuff ;^).

      --
      David Serrano

        This simply does not work!

        Replace the search text with "There's more than one way to do it (more or less than one)." and you'll see what I mean.

        you don't do any backreferencing but a simple replacement with the first strings found.

        $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print
Re: How to do regex backreferences within $variable replacement text?
by ikegami (Patriarch) on Sep 17, 2005 at 21:03 UTC
    String::Interpolate does what you want, although many people will suggest the use of a template system.