cheako has asked for the wisdom of the Perl Monks concerning the following question:

Searching shows re::engine::PCRE as one possible, but is it safe?

This question was asked and not completely answered on stackoverflow:
How can I safely use regexes from user input.

The following is my program(check_pstime), it's a skeleton. The purpose is to collect CPU time information from matched processes, mainly for graphs. I'll admit in this use case it'll be the OP who provides the regex, but this is an important question to have a good solid answer too. Like a CPAN module that will safely build a qr/pattern/ from a tainted string.

Edit:
Moved the source code to github.

  • Comment on Untaint a string match, regular expression.

Replies are listed 'Best First'.
Re: Untaint a string match, regular expression.
by BrowserUk (Patriarch) on May 18, 2015 at 00:02 UTC

    As a first pass at understanding your concerns, my first thought is to simply ban any regex that contains either of the extended patterns that allow for code execution: (?{...}) and (??{ ... }).

    To that end, test if the regex contains either of those patterns:

    die "Regex containing code disallowed" if $userRe =~ m[\(\?\??\{];

    Combine that with a check that the regex will compile: $userRe = qr[$userRe]; and it's hard to see what input, that passed those two checks, could be dangerous?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      :) it pretty much does that by default :)
      $ perl -e" my $re = shift; 1 =~ /$re/; " "(??{die666})" Eval-group not allowed at runtime, use re 'eval' in regex m/(??{die666 +})/ at -e line 1.

        But that is rather easily bypassed:

        C:\Users\HomeAdmin>set PERL5OPT=-Mre=eval C:\Users\HomeAdmin>perl -e" my $re = shift; 1 =~ /$re/; " "(?{die666 +})" C:\Users\HomeAdmin>

        I agree, that anything the user could supply the OPs program with from the command line, they could equally just supply to perl directly, via the command line; but that's partly why I phrased my response the way I did. Ie. Trying to tease out exactly what the OPs concerns are.

        For example, perhaps the arguments that will be supplied to the OPs program, originate from a web page interface accessible to 'external' users.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      The documentation suggest that unless use re 'eval' is in scope. There is at least some protection from embedded code.
Re: Untaint a string match, regular expression.
by cheako (Beadle) on May 18, 2015 at 03:33 UTC

      Depending on your application, I would really only let the user input wildcards like "*" and "?" (DOS style), not regular expressions. Alternatively, if the data to be matched comes from a database SQL style wildcards could be an alternative. Everything else will be escaped.

      This is easy to implement and will not create trouble with security or memory. It will also go a long way, probably for most applications.

      If you look at PM's Super Search, it works without any regular expressions but is still quite powerful.

Re: Untaint a string match, regular expression.
by Anonymous Monk on May 17, 2015 at 23:22 UTC

    Either you let a user specify a regex , or you don't

    Either you trust the user or you don't

    re

    If you have questions about pcre, start with its docs

    But, if you cant even use the correct vocabulary, nothing is safe

      I think it would be trivial to write a subroutine in perl that takes a string and splits it on '^', '$', '.*' and then recombines the string as a regex, with the rest of the string regex escaped.

      The above would allow for user to specify a regex and be secure for untrusted users. This can also be expanded to include many more features.

        Ok, if you think so, what is your question?
Re: Untaint a string match, regular expression.
by Anonymous Monk on May 18, 2015 at 01:17 UTC
Re: Untaint a string match, regular expression.
by Anonymous Monk on May 18, 2015 at 21:03 UTC
      Thank you for this, RE2 looks like a great option. Make sure at add (-strict => 1) to your use statement or re::engine::RE2 will fall back to perl's re.

        The following is a quote from junyer, owner of the project on github.

        RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.