Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

User regexps

by rkg (Hermit)
on Jan 14, 2004 at 14:52 UTC ( [id://321263]=perlquestion: print w/replies, xml ) Need Help??

rkg has asked for the wisdom of the Perl Monks concerning the following question:

Hi.

I have an app that solicits regexps from users then uses them to filter data.

Here's a code snippet that greps objects from a list (here named @x) whose regexps match a predefined hunk of text (here named matchtext).

my @matches = map { $_->result} grep { my $re = $_->regex; $matchtext =~ /$re/i; } @x;
My two questions:
  • If the regexps come from users, will I have a taint problem? Can they do harm via matchtext =~ /$re/i;, or just cause errors? (This is all wrapped in an eval, so I catch badly formed regexps; I am worried about intentional or unintentional ill-effects beyond an error.) Would  qr help with this? Or how about  Safe?
  • How could a user specify a regexp meaning, "Any string that does not have the phrase 'foo' in it?" My natural approach would be  $matchtext !~ /foo/ or  ! ($matchtext =~ /foo/), but here I'm stuck inside the matchtext =~ /$re/i; construct... is there a decent way of saying "does not contain foo" inside a standard "match"  m// regexp?
Thanks for your advice.

rkg

Hey, my 100th post! Whoop Whoop Whoop Whoop

Replies are listed 'Best First'.
Re: User regexps
by Abigail-II (Bishop) on Jan 14, 2004 at 15:07 UTC
    If the regexps come from users, will I have a taint problem?
    Unless you explicitely enable use re 'eval'; you won't have problems where tainting can help you prevent getting damaged - the user can't inject arbitrary code.
    I am worried about intentional or unintentional ill-effects beyond an error.
    And you should! No tainting is going to protect you from a regexp that will run "forever", or that's going to exhaust your memory or stack size in a quick fashion. There's no easy defensive against this.
    Would qr help with this? Or how about Safe ?
    No, and no. Read the manual page of Safe.pm about things Safe doesn't protect you from - the first things mentioned are "Memory" and "CPU".
    How could a user specify a regexp meaning, "Any string that does not have the phrase 'foo' in it?"
    /^(?:(?!foo).)*$/s;

    Abigail

Re: User regexps
by liz (Monsignor) on Jan 14, 2004 at 15:03 UTC
    ...This is all wrapped in an eval, so I catch badly formed regexps; I am worried about intentional or unintentional ill-effects beyond an error...

    You should use taint.

    You should be aware of what use re 'eval' allows you to do with regular expressions.

    And you should of course be aware of source code injection. Suppose the user specifies: "a/; system( 'some evil command' ); m/a" and your code is:

    eval "m/$query/";
    you're in deep trouble.

    Liz

      And you should of course be aware of source code injection. Suppose the user specifies: "a/; system( 'some evil command' ); m/a" and your code is:
      eval "m/$query/";
      But that's not the code! Read the post. The code is:
      my $re = $_->regex; $matchtext =~ /$re/i;
      There's no danger of source code injection here (unless there's an unseen use re 'eval' in an enclosing scope.

      Abigail

        Took me a while to understand why the second was safe and the first wasn't. Thanks for putting them side by side, clearly labeled, for me to think about. I would have used the second without worry, and the first (anything with an eval on user data) always worries me, but that's just habbit. Looking at these two examples bumped it back up to real understanding again, which is always nice.
Re: User regexps
by ysth (Canon) on Jan 14, 2004 at 15:49 UTC
    As long as you are not blindly interpolating into a string eval (e.g. eval ".../$re/...", you can just set an alarm to guard against maliciously time-consumptive regex's and you should be ok.

    Abigail's is nice, but I prefer $matchtext =~ /^(?!.*foo)/

Re: User regexps
by duff (Parson) on Jan 14, 2004 at 15:03 UTC

    Read perlre and look for where it talks about the (?{code}) construct and you tell me if they can do any damage :-)

      Now, tell us how the user is going to put in a (?{code}) construct if use re 'eval'; hasn't been enabled.

      Now, running regexes supplied by a possible hostile user can give problems - but the problems are not the possibility of running arbitrary code; at least not by default.

      Abigail

Re: User regexps
by Roy Johnson (Monsignor) on Jan 14, 2004 at 16:33 UTC
    How could a user specify a regexp meaning, "Any string that does not have the phrase 'foo' in it?"
    Abigail gave an elegant one. I would expect this variation to be a little more efficient by virtue of only engaging lookahead upon encountering an f: /^(?:[^f]*f(?!oo))*[^f]*$/s;

    The PerlMonk tr/// Advocate
      //s there is useless, since it only affects .
        As useless as the pockets on my Cargo Cult pants from Old Navy®.

        Update: Your suggestion of /^.*(?!foo)/ is the Right Way To Do It, anyway.


        The PerlMonk tr/// Advocate
Re: User regexps
by Fletch (Bishop) on Jan 14, 2004 at 15:09 UTC

    If you can't trust your users it would be better to allow just a limited subset of regexen ( maybe only allow the characters []()A-z0-9\s.+*?|- and nothing else ).

      And that's going to prevent you from danger exactly how?

      There are two potential dangers when running user supplied regexes. 1) arbitrary code injection and 2) resource exhaustion. 1) is not possible by default, only if you enable use re 'eval', or use string eval (which isn't done by the OP). 2) is a more serious problem, and can be archieved with the limited of characters you propose.

      Abigail

        The resource exhaustion issues can be partly defended against using ulimit at the shell level, or suitable system calls. I don't actually know if there's a direct Perl interface to that; nothing in perlfunc anyway.
      Whether you can trust your users isn't an issue. Whether you can trust anyone who might come across your UI anytime in the future is an issue.

      Out of general paranoia our CGI wrapper drops any characters that are not in {A-Za-z0-9-\/.@,: }. (And ':' was a recent addition, to support entering URLs.)

      --
      Spring: Forces, Coiled Again!
Re: User regexps
by gmpassos (Priest) on Jan 16, 2004 at 00:00 UTC
    For code injection I use a Safe compartment, where I enable only OP that wont change the symbol-table or call CODE (since I enable some simple CORE functions, like time, pack, etc...):
    my @PERMIT_OPS = qw( :base_mem null stub pushmark const defined undef preinc i_preinc predec i_predec postinc i_postinc postdec i_postde +c int hex oct abs pow multiply i_multiply divide i_divide modulo i_modulo add i_add subtract i_subtract left_shift right_shift bit_and bit_xor bit_or negate i_negate not complement lt i_lt gt i_gt le i_le ge i_ge eq i_eq ne i_ne ncmp i_ncmp slt sgt sle sge seq sne scmp substr stringify length ord chr ucfirst lcfirst uc lc quotemeta trans chop schop chomp schomp match split list lslice reverse cond_expr flip flop andassign orassign and or xor lineseq scope enter leave setstate rv2cv leaveeval gvsv gv gelem padsv padav padhv padany refgen srefgen ref time sort pack unpack ) ; use Safe ; $safe = Safe->new('CODE::INJECTION') ; $safe->permit_only(@PERMIT_OPS) ; ## For regex insertion you should use: my $RE = $safe->reval('qr/<\w+.*?>/s'); if ( "bla <b>bold</b> bla" =~ /$RE/ ) { print "has tag\n" ;} my $RE_caption = $safe->reval('qr/(\d)/s'); my (@ret) = ( "a1 b2 c3" =~ /$RE_caption/g ); print "@ret\n" ; ## 1 2 3

    I use it to enable confiuration files like that:

    <SERVER> port => 80 extern => 1 listen => 5 name => "Some Server Name\n and a new line" </SERVER> <DOMAINS> localhost => c:\dev\www </DOMAINS> <MYSQL> DB1 => { user => 'foo' , pass => '123' , host => 'domain.foo' } <MYSQL>

    Soo, the user can set Perl data structure as a entry (but need to be in one line), since I enable in the compartment the use of anonymous variables ({},[],"",'').

    For what you want maybe you can unset some OP and make it more secure. Enjoy! ;-P

    Graciliano M. P.
    "Creativity is the expression of the liberty".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://321263]
Approved by coreolyn
Front-paged by Roy Johnson
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-20 02:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found