User regexps

rkg has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: User regexps by Abigail-II (Bishop) on Jan 14, 2004 at 15:07 UTC
If the regexps come from users, will I have a taint problem? Unless you explicitely enable `use re 'eval';` you won't have problems where tainting can help you prevent getting damaged - the user can't inject arbitrary code. I am worried about intentional or unintentional ill-effects beyond an error. And you should! No tainting is going to protect you from a regexp that will run "forever", or that's going to exhaust your memory or stack size in a quick fashion. There's no easy defensive against this. Would qr help with this? Or how about Safe ? No, and no. Read the manual page of Safe.pm about things Safe doesn't protect you from - the first things mentioned are "Memory" and "CPU". How could a user specify a regexp meaning, "Any string that does not have the phrase 'foo' in it?" `/^(?:(?!foo).)*$/s;` [download] Abigail	[reply] [d/l]
Re: User regexps by liz (Monsignor) on Jan 14, 2004 at 15:03 UTC
...This is all wrapped in an eval, so I catch badly formed regexps; I am worried about intentional or unintentional ill-effects beyond an error... You should use taint. You should be aware of what use re 'eval' allows you to do with regular expressions. And you should of course be aware of source code injection. Suppose the user specifies: "a/; system( 'some evil command' ); m/a" and your code is: `eval "m/$query/";` [download] you're in deep trouble. Liz	[reply] [d/l]
Re: User regexps by Abigail-II (Bishop) on Jan 14, 2004 at 15:39 UTC
And you should of course be aware of source code injection. Suppose the user specifies: "a/; system( 'some evil command' ); m/a" and your code is: `eval "m/$query/";` [download] But that's not the code! Read the post. The code is: `my $re = $_->regex; $matchtext =~ /$re/i;` [download] There's no danger of source code injection here (unless there's an unseen `use re 'eval'` in an enclosing scope. Abigail	[reply] [d/l] [select]
Re: Re: User regexps by dd-b (Monk) on Jan 14, 2004 at 18:22 UTC
Took me a while to understand why the second was safe and the first wasn't. Thanks for putting them side by side, clearly labeled, for me to think about. I would have used the second without worry, and the first (anything with an eval on user data) always worries me, but that's just habbit. Looking at these two examples bumped it back up to real understanding again, which is always nice.	[reply]
Re: User regexps by ysth (Canon) on Jan 14, 2004 at 15:49 UTC
As long as you are not blindly interpolating into a string eval (e.g. `eval ".../$re/..."`, you can just set an alarm to guard against maliciously time-consumptive regex's and you should be ok. Abigail's is nice, but I prefer `$matchtext =~ /^(?!.*foo)/`	[reply] [d/l] [select]
Re: User regexps by duff (Parson) on Jan 14, 2004 at 15:03 UTC
Read perlre and look for where it talks about the `(?{code})` construct and you tell me if they can do any damage :-) duff	[reply] [d/l]
Re: User regexps by Abigail-II (Bishop) on Jan 14, 2004 at 15:33 UTC
Now, tell us how the user is going to put in a `(?{code})` construct if `use re 'eval';` hasn't been enabled. Now, running regexes supplied by a possible hostile user can give problems - but the problems are not the possibility of running arbitrary code; at least not by default. Abigail	[reply]
Re: User regexps by Roy Johnson (Monsignor) on Jan 14, 2004 at 16:33 UTC
How could a user specify a regexp meaning, "Any string that does not have the phrase 'foo' in it?" Abigail gave an elegant one. I would expect this variation to be a little more efficient by virtue of only engaging lookahead upon encountering an f: `/^(?:[^f]f(?!oo))[^f]*$/s;` The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: Re: User regexps by ysth (Canon) on Jan 14, 2004 at 16:50 UTC
//s there is useless, since it only affects .	[reply]
Re: Re: Re: User regexps by Roy Johnson (Monsignor) on Jan 14, 2004 at 16:59 UTC
As useless as the pockets on my Cargo Cult pants from Old Navy®. Update: Your suggestion of `/^.*(?!foo)/` is the Right Way To Do It, anyway. The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: User regexps by Fletch (Bishop) on Jan 14, 2004 at 15:09 UTC
If you can't trust your users it would be better to allow just a limited subset of regexen ( maybe only allow the characters `[]()A-z0-9\s.+*?\|-` and nothing else ).	[reply] [d/l]
Re: User regexps by Abigail-II (Bishop) on Jan 14, 2004 at 15:44 UTC
And that's going to prevent you from danger exactly how? There are two potential dangers when running user supplied regexes. 1) arbitrary code injection and 2) resource exhaustion. 1) is not possible by default, only if you enable `use re 'eval'`, or use string eval (which isn't done by the OP). 2) is a more serious problem, and can be archieved with the limited of characters you propose. Abigail	[reply] [d/l]
Re: Re: User regexps by dd-b (Monk) on Jan 14, 2004 at 18:13 UTC
The resource exhaustion issues can be partly defended against using ulimit at the shell level, or suitable system calls. I don't actually know if there's a direct Perl interface to that; nothing in perlfunc anyway.	[reply]
Re: Re: Re: User regexps by Fletch (Bishop) on Jan 15, 2004 at 01:09 UTC
Re: Re: User regexps by paulbort (Hermit) on Jan 15, 2004 at 20:37 UTC
Whether you can trust your users isn't an issue. Whether you can trust anyone who might come across your UI anytime in the future is an issue. Out of general paranoia our CGI wrapper drops any characters that are not in {A-Za-z0-9-\/.@,: }. (And ':' was a recent addition, to support entering URLs.) -- Spring: Forces, Coiled Again!	[reply]
Re: User regexps by gmpassos (Priest) on Jan 16, 2004 at 00:00 UTC
For code injection I use a Safe compartment, where I enable only OP that wont change the symbol-table or call CODE (since I enable some simple CORE functions, like time, pack, etc...): my @PERMIT_OPS = qw( :base_mem null stub pushmark const defined undef preinc i_preinc predec i_predec postinc i_postinc postdec i_postde +c int hex oct abs pow multiply i_multiply divide i_divide modulo i_modulo add i_add subtract i_subtract left_shift right_shift bit_and bit_xor bit_or negate i_negate not complement lt i_lt gt i_gt le i_le ge i_ge eq i_eq ne i_ne ncmp i_ncmp slt sgt sle sge seq sne scmp substr stringify length ord chr ucfirst lcfirst uc lc quotemeta trans chop schop chomp schomp match split list lslice reverse cond_expr flip flop andassign orassign and or xor lineseq scope enter leave setstate rv2cv leaveeval gvsv gv gelem padsv padav padhv padany refgen srefgen ref time sort pack unpack ) ; use Safe ; $safe = Safe->new('CODE::INJECTION') ; $safe->permit_only(@PERMIT_OPS) ; ## For regex insertion you should use: my $RE = $safe->reval('qr/<\w+.*?>/s'); if ( "bla <b>bold</b> bla" =~ /$RE/ ) { print "has tag\n" ;} my $RE_caption = $safe->reval('qr/(\d)/s'); my (@ret) = ( "a1 b2 c3" =~ /$RE_caption/g ); print "@ret\n" ; ## 1 2 3 [download] I use it to enable confiuration files like that: `<SERVER> port => 80 extern => 1 listen => 5 name => "Some Server Name\n and a new line" </SERVER> <DOMAINS> localhost => c:\dev\www </DOMAINS> <MYSQL> DB1 => { user => 'foo' , pass => '123' , host => 'domain.foo' } <MYSQL>` [download] Soo, the user can set Perl data structure as a entry (but need to be in one line), since I enable in the compartment the use of anonymous variables ({},[],"",''). For what you want maybe you can unset some OP and make it more secure. Enjoy! ;-P Graciliano M. P. "Creativity is the expression of the liberty".	[reply] [d/l] [select]


Don't ask to ask, just ask
	PerlMonks