Re: Untaint a string match, regular expression.
by BrowserUk (Patriarch) on May 18, 2015 at 00:02 UTC
|
As a first pass at understanding your concerns, my first thought is to simply ban any regex that contains either of the extended patterns that allow for code execution: (?{...}) and (??{ ... }).
To that end, test if the regex contains either of those patterns:
die "Regex containing code disallowed" if $userRe =~ m[\(\?\??\{];
Combine that with a check that the regex will compile: $userRe = qr[$userRe]; and it's hard to see what input, that passed those two checks, could be dangerous?
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] [d/l] [select] |
|
|
:) it pretty much does that by default :)
$ perl -e" my $re = shift; 1 =~ /$re/; " "(??{die666})"
Eval-group not allowed at runtime, use re 'eval' in regex m/(??{die666
+})/ at -e line 1.
| [reply] [d/l] |
|
|
C:\Users\HomeAdmin>set PERL5OPT=-Mre=eval
C:\Users\HomeAdmin>perl -e" my $re = shift; 1 =~ /$re/; " "(?{die666
+})"
C:\Users\HomeAdmin>
I agree, that anything the user could supply the OPs program with from the command line, they could equally just supply to perl directly, via the command line; but that's partly why I phrased my response the way I did. Ie. Trying to tease out exactly what the OPs concerns are.
For example, perhaps the arguments that will be supplied to the OPs program, originate from a web page interface accessible to 'external' users.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] [d/l] |
|
|
|
|
|
|
The documentation suggest that unless use re 'eval' is in scope. There is at least some protection from embedded code.
| [reply] [d/l] |
|
|
| [reply] |
Re: Untaint a string match, regular expression.
by cheako (Beadle) on May 18, 2015 at 03:33 UTC
|
| [reply] [d/l] |
|
|
Depending on your application, I would really only let the user input wildcards like "*" and "?" (DOS style), not regular expressions. Alternatively, if the data to be matched comes from a database SQL style wildcards could be an alternative. Everything else will be escaped.
This is easy to implement and will not create trouble with security or memory. It will also go a long way, probably for most applications.
If you look at PM's Super Search, it works without any regular expressions but is still quite powerful.
| [reply] |
Re: Untaint a string match, regular expression.
by Anonymous Monk on May 17, 2015 at 23:22 UTC
|
Either you let a user specify a regex , or you don't
Either you trust the user or you don't
re
If you have questions about pcre, start with its docs
But, if you cant even use the correct vocabulary, nothing is safe
| [reply] |
|
|
I think it would be trivial to write a subroutine in perl that takes a string and splits it on '^', '$', '.*' and then recombines the string as a regex, with the rest of the string regex escaped. The above would allow for user to specify a regex and be secure for untrusted users. This can also be expanded to include many more features.
| [reply] |
|
|
Ok, if you think so, what is your question?
| [reply] |
|
|
|
|
|
Re: Untaint a string match, regular expression.
by Anonymous Monk on May 18, 2015 at 01:17 UTC
|
| [reply] |
Re: Untaint a string match, regular expression.
by Anonymous Monk on May 18, 2015 at 21:03 UTC
|
| [reply] |
|
|
Thank you for this, RE2 looks like a great option. Make sure at add (-strict => 1) to your use statement or re::engine::RE2 will fall back to perl's re.
| [reply] |
|
|
The following is a quote from junyer, owner of the project on github.
RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.
| [reply] |