Re: User regexps
by Abigail-II (Bishop) on Jan 14, 2004 at 15:07 UTC
|
If the regexps come from users, will I have a taint problem?
Unless you explicitely enable use re 'eval'; you
won't have problems where tainting can help you prevent
getting damaged - the user can't inject arbitrary code.
I am worried about intentional or unintentional ill-effects beyond an error.
And you should! No tainting is going to protect you from a
regexp that will run "forever", or that's going to exhaust
your memory or stack size in a quick fashion. There's no
easy defensive against this.
Would qr help with this? Or how about Safe ?
No, and no. Read the manual page of Safe.pm about things
Safe doesn't protect you from - the first things mentioned
are "Memory" and "CPU".
How could a user specify a regexp meaning, "Any string that does not have the phrase 'foo' in it?"
/^(?:(?!foo).)*$/s;
Abigail | [reply] [d/l] |
Re: User regexps
by liz (Monsignor) on Jan 14, 2004 at 15:03 UTC
|
...This is all wrapped in an eval, so I catch badly formed regexps; I am worried about intentional or unintentional ill-effects beyond an error...
You should use taint.
You should be aware of what use re 'eval' allows you to do with regular expressions.
And you should of course be aware of source code injection. Suppose the user specifies: "a/; system( 'some evil command' ); m/a" and your code is:
eval "m/$query/";
you're in deep trouble.
Liz | [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
Took me a while to understand why the second was safe and the first wasn't. Thanks for putting them side by side, clearly labeled, for me to think about. I would have used the second without worry, and the first (anything with an eval on user data) always worries me, but that's just habbit. Looking at these two examples bumped it back up to real understanding again, which is always nice.
| [reply] |
Re: User regexps
by ysth (Canon) on Jan 14, 2004 at 15:49 UTC
|
As long as you are not blindly interpolating into a string
eval (e.g. eval ".../$re/...", you can just set an alarm to guard against maliciously time-consumptive regex's and you should be ok.
Abigail's is nice, but I prefer $matchtext =~ /^(?!.*foo)/ | [reply] [d/l] [select] |
Re: User regexps
by duff (Parson) on Jan 14, 2004 at 15:03 UTC
|
Read perlre and look for where it talks about the (?{code}) construct and you tell me if they can do any damage :-)
| [reply] [d/l] |
|
| [reply] |
Re: User regexps
by Roy Johnson (Monsignor) on Jan 14, 2004 at 16:33 UTC
|
| [reply] [d/l] |
|
//s there is useless, since it only affects .
| [reply] |
|
| [reply] [d/l] |
Re: User regexps
by Fletch (Bishop) on Jan 14, 2004 at 15:09 UTC
|
If you can't trust your users it would be better to allow just a limited subset of regexen ( maybe only allow the characters []()A-z0-9\s.+*?|- and nothing else ).
| [reply] [d/l] |
|
And that's going to prevent you from danger exactly how?
There are two potential dangers when running user supplied
regexes. 1) arbitrary code injection and 2) resource exhaustion. 1) is not possible by default,
only if you enable use re 'eval', or use string
eval (which isn't done by the OP). 2) is a more serious problem, and can be archieved with the limited of characters
you propose.
Abigail
| [reply] [d/l] |
|
The resource exhaustion issues can be partly defended against using ulimit at the shell level, or suitable system calls. I don't actually know if there's a direct Perl interface to that; nothing in perlfunc anyway.
| [reply] |
|
|
| [reply] |
Re: User regexps
by gmpassos (Priest) on Jan 16, 2004 at 00:00 UTC
|
For code injection I use a Safe compartment, where I enable only OP that wont change the symbol-table or call CODE (since I enable some simple CORE functions, like time, pack, etc...):
my @PERMIT_OPS = qw(
:base_mem
null stub pushmark const defined undef
preinc i_preinc predec i_predec postinc i_postinc postdec i_postde
+c
int hex oct abs pow multiply i_multiply divide i_divide
modulo i_modulo add i_add subtract i_subtract
left_shift right_shift bit_and bit_xor bit_or negate i_negate
not complement
lt i_lt gt i_gt le i_le ge i_ge eq i_eq ne i_ne ncmp i_ncmp
slt sgt sle sge seq sne scmp
substr stringify length ord chr
ucfirst lcfirst uc lc quotemeta trans chop schop chomp schomp
match split
list lslice reverse
cond_expr flip flop andassign orassign and or xor
lineseq scope enter leave setstate
rv2cv
leaveeval
gvsv gv gelem
padsv padav padhv padany
refgen srefgen ref
time
sort
pack unpack
) ;
use Safe ;
$safe = Safe->new('CODE::INJECTION') ;
$safe->permit_only(@PERMIT_OPS) ;
## For regex insertion you should use:
my $RE = $safe->reval('qr/<\w+.*?>/s');
if ( "bla <b>bold</b> bla" =~ /$RE/ ) { print "has tag\n" ;}
my $RE_caption = $safe->reval('qr/(\d)/s');
my (@ret) = ( "a1 b2 c3" =~ /$RE_caption/g );
print "@ret\n" ; ## 1 2 3
I use it to enable confiuration files like that:
<SERVER>
port => 80
extern => 1
listen => 5
name => "Some Server Name\n and a new line"
</SERVER>
<DOMAINS>
localhost => c:\dev\www
</DOMAINS>
<MYSQL>
DB1 => { user => 'foo' , pass => '123' , host => 'domain.foo' }
<MYSQL>
Soo, the user can set Perl data structure as a entry (but need to be in one line), since I enable in the compartment the use of anonymous variables ({},[],"",'').
For what you want maybe you can unset some OP and make it more secure. Enjoy! ;-P
Graciliano M. P.
"Creativity is the expression of the liberty".
| [reply] [d/l] [select] |