Re: How to do regex backreferences within $variable replacement text?
by Zaxo (Archbishop) on Sep 17, 2005 at 19:30 UTC
|
This is a dangerous application to put on the web. You give the user an opportunity to run arbitrary code in (?{...}) or (??{...}) constructs in the regex. With the /e switch, arbitrary code can also be run in the replacement string.
| [reply] |
|
|
re: Security, that's half of the reason why I would prefer not to use eval() at all (the other half is performance reasons, since eval() is recompiled each time its executed at runtime.)
| [reply] |
|
|
Compilation time is going to be there no matter what solution you use; something needs to figure out which characters are plain and which are part of the name of a variable to embed, and something much actually do the embedding. True, not some compilers are faster than others, but I suspect that perl is very quick at compiling a string literal, especially since it's already loaded in memory.
| [reply] [d/l] |
|
|
| [reply] [d/l] |
Re: How to do regex backreferences within $variable replacement text?
by Tanktalus (Canon) on Sep 17, 2005 at 21:41 UTC
|
Here's some example code on how to do this with a few tests. I'd invite adding some more tests and any bug reports/security issues as I've not really thought about this from a security perspective yet. It's a bit slower in that it matches twice, but it completely avoids any eval.
use strict;
use warnings;
sub substitute
{
my ($string, $from, $to) = @_;
$from = qr/$from/ unless ref $from and ref $from eq 'Regexp';
my @a = $string =~ $from;
$to =~ s/\$(\d+)/$a[$1-1]/g; # was $to =~ s/\$(\d+)/\Q$a[$1-1]/g;
$string =~ s/$from/$to/;
$string;
}
my @tests = (
[ "this is some test", "(is) s(o)me", '$1 n$2t a' ],
[ "this is some test", "is some", 'is not a' ],
);
for my $t (@tests)
{
print "[$t->[0]]...";
print "[",substitute(@$t), "]\n";
}
prints out:
[this is some test]...[this is not a test]
[this is some test]...[this is not a test]
which is what I expected. But, as you can see, it's not a very extensive test, so feel free to try a few more.
Update: It turns out that the \Q in the $to replacement wasn't needed. | [reply] [d/l] [select] |
|
|
Ok, this appears to be working, except for one thing: for the backreferenced sections, spaces are getting prepended with a backslash in the $to clause, and subsequently in the $string. Here's a test I added:
[ 'Once upon a time, Jack Roush was not the king of the NASCAR garage,
+ but a stock-car outsider from Michigan trying to start a Winston Cup
+ team with a small-time budget.', '(king of the NASCAR )(garage, but
+a stock-car)', 'HERE1$1HERE2$2HERE3' ]
And here's the output:
[Once upon a time, Jack Roush was not the king of the NASCAR garage, b
+ut a stock-car outsider from Michigan trying to start a Winston Cup t
+eam with a small-time budget.]
...
[Once upon a time, Jack Roush was not the HERE1king\ of\ the\ NASCAR\
+HERE2garage\,\ but\ a\ stock\-carHERE3 outsider from Michigan trying
+to start a Winston Cup team with a small-time budget.]
Any ideas? I can't see where these backslashes are coming from...!
| [reply] [d/l] [select] |
|
|
When allowing the emdedding of code (loosely defined), provide an escape mechanism!!! For example,
- I have no means of replacing with $1 . "00". Perl uses "${1}00".
- I have no means of replacing with '$1.00'. Perl uses the literal "\$1.00".
When adding your escape mechanism, careful not to break existing functionality. For example,
- Continue allowing me to replace with '\$1'. Perl uses "\\$1".
Update: A solution is to replace
$to =~ s/\$(\d+)/$a[$1-1]/g;
with
$to =~ s/\\(.)|\${(\d+)})|\$(\d+)/
(defined $1
? $1
: (defined $2
? $a[$2-1]
: $a[$3-1]
)
)
/eg;
| [reply] [d/l] [select] |
Re: How to do regex backreferences within $variable replacement text?
by GrandFather (Saint) on Sep 17, 2005 at 19:11 UTC
|
use warnings;
use strict;
my $user_defined_string = "abcabcabc";
my $user_defined_search = '(a)';
my $user_defined_replace = '---$1---';
my $user_defined_replace = '"---".$1."---"';
print "before: $user_defined_string\n";
$user_defined_string =~ s/$user_defined_search/$user_defined_replace/e
+e;
print "after: $user_defined_string\n";
prints:
before: abcabcabc
after: ---a---bcabcabc
Update: Fix the $user_defined_replace string
BTW: you are aware that your user can execute pretty much any code using this technique?. You may want to do some aggressive filtering on the expressions that are allowed, and that may be pretty tricky to do!
Perl is Huffman encoded by design.
| [reply] [d/l] [select] |
|
|
Hmm... that didn't work either... it prints:
before: abcabcabc
after: "---".$1."---"bcabcabc
And re: security issues around executing any code, this is another reason I was hoping to avoid eval() or any of its close relatives!
As another possible idea, is there a way to precompile the replacement text of a regular expression, sort of like what qr// does for you with the search portion?
| [reply] |
|
|
$user_defined_string =~ s/$user_defined_search/$user_defined_replace/e
+eg;
__END__
before: abcabcabc
after: ---a---bc---a---bc---a---bc
Now back to your security issue, here is a simple thing to do as a replacement and you will get the username. In otherwords it is really dangerous as pointed out by Zaxo and GrandFather
my $user_defined_replace = '`whoami`';
before: abcabcabc
after: xxx
bcxxx
bcxxx
bc
Note: in the above xxx stands for the username
Update:
I might be wrong but I cannot see a nice way to handle user definied substitutions... If you give them control to becomoe part of your script (i.e. they give some code to be executed inside your script) then they can do whatever they want... A better would be to look through the string they send you and check for potentially harmful substitutions like backticks and other operators and then not execute if present. | [reply] [d/l] [select] |
|
|
|
|
Sorry, coffe effect still applies: it needs two eval switches (now updated).
You can't do it without evaluation in some form. You could parse the replaced string for $n's and then replace those with their respective captured text. I'll post something in a while
Perl is Huffman encoded by design.
| [reply] |
|
|
DOH, just realized that you had "/ee" ... tried that and it did indeed work :) But this is still basically an eval(), right?
| [reply] |
|
|
|
|
|
|
|
I tried that, unfortunately it didn't work. I got:
before: abcabcabc
after: ---$1---bc---$1---bc---$1---bc
The "$1" is getting interpreted literally, not as a backreference.
| [reply] |
Re: How to do regex backreferences within $variable replacement text?
by GrandFather (Saint) on Sep 18, 2005 at 00:48 UTC
|
use warnings;
use strict;
my $udStr = "abcabcabc";
my $udSearch = '(a)';
my $udRep = '---$1---';
print "before: $udStr\n";
my $before = $udStr;
$udStr =~ s/$udSearch/$udRep/;
my @starts = @-;
my @ends = @+;
for (1..$#starts)
{
my $replace = substr $before, $starts[$_], $ends[$_] - $starts[$_];
$udStr =~ s/\$$_(?=\D)/$replace/;
}
print "after: $udStr\n";
Prints:
before: abcabcabc
after: ---a---bcabcabc
This still doesn't fix (?{...}) and (??{...}) in $user_defined_search, but those could be filtered.
Perl is Huffman encoded by design.
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
| [reply] |
|
|
use warnings;
use strict;
my $user_defined_string = "There's more than one way to do it (more th
+an one).";
my $user_defined_search = '(more)(.*?)(one)';
my $user_defined_replace = '<b>$1</b>$2<b>$3</b>';
my (@subs) = $user_defined_string =~ /$user_defined_search/;
for my $sub (1..@subs) {
$user_defined_replace =~ s/\$$sub/$subs[$sub-1]/ge;
}
print "mangled replace: $user_defined_replace\n";
$user_defined_string =~ s/$user_defined_search/$user_defined_replace/g
+e;
print "after: $user_defined_string\n";
__OUTPUT__
mangled replace: <b>more</b> than <b>one</b>
after: There's <b>more</b> than <b>one</b> way to do it (<b>more</b> t
+han <b>one</b>).
As we can see, the user is expected to have a deep understanding of Perl regexes (non-greediness in this example) if she wants to do fancy stuff ;^).
| [reply] [d/l] |
|
|
| [reply] [d/l] |
Re: How to do regex backreferences within $variable replacement text?
by ikegami (Patriarch) on Sep 17, 2005 at 21:03 UTC
|
String::Interpolate does what you want, although many people will suggest the use of a template system. | [reply] |