Untaint a string match, regular expression.

cheako has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Untaint a string match, regular expression. by BrowserUk (Patriarch) on May 18, 2015 at 00:02 UTC
As a first pass at understanding your concerns, my first thought is to simply ban any regex that contains either of the extended patterns that allow for code execution: (?{...}) and (??{ ... }). To that end, test if the regex contains either of those patterns: `die "Regex containing code disallowed" if $userRe =~ m[\(\?\??\{];` [download] Combine that with a check that the regex will compile: `$userRe = qr[$userRe];` and it's hard to see what input, that passed those two checks, could be dangerous? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l] [select]
Re^2: Untaint a string match, regular expression. by Anonymous Monk on May 18, 2015 at 00:16 UTC
:) it pretty much does that by default :) `$ perl -e" my $re = shift; 1 =~ /$re/; " "(??{die666})" Eval-group not allowed at runtime, use re 'eval' in regex m/(??{die666 +})/ at -e line 1.` [download]	[reply] [d/l]
Re^3: Untaint a string match, regular expression. by BrowserUk (Patriarch) on May 18, 2015 at 00:28 UTC
But that is rather easily bypassed: `C:\Users\HomeAdmin>set PERL5OPT=-Mre=eval C:\Users\HomeAdmin>perl -e" my $re = shift; 1 =~ /$re/; " "(?{die666 +})" C:\Users\HomeAdmin>` [download] I agree, that anything the user could supply the OPs program with from the command line, they could equally just supply to perl directly, via the command line; but that's partly why I phrased my response the way I did. Ie. Trying to tease out exactly what the OPs concerns are. For example, perhaps the arguments that will be supplied to the OPs program, originate from a web page interface accessible to 'external' users. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l]
Re^4: Untaint a string match, regular expression. by Anonymous Monk on May 18, 2015 at 00:58 UTC
Re^4: Untaint a string match, regular expression. by cheako (Beadle) on May 18, 2015 at 00:43 UTC
Re^2: Untaint a string match, regular expression. by cheako (Beadle) on May 18, 2015 at 00:27 UTC
The documentation suggest that unless `use re 'eval'` is in scope. There is at least some protection from embedded code.	[reply] [d/l]
Re^3: Untaint a string match, regular expression. by BrowserUk (Patriarch) on May 18, 2015 at 00:29 UTC
See Re^3: Untaint a string match, regular expression.. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re: Untaint a string match, regular expression. by cheako (Beadle) on May 18, 2015 at 03:33 UTC
To sum up the best advice I've seen. Suggest to add `no re 'eval';` to ensure it's off Use a sub-process(fork) with to BSD::Resource(even on Linux) to ulimit memory and kill the child after some timeout Perhaps #2 is overkill, there may be a simpler method, like additional flags to #1. Suggest to add a maxiterations to limit the number of times re is allowed to loop and maxmem. References: Re^4: Untaint a string match, regular expression. Re^7: Untaint a string match, regular expression.	[reply] [d/l]
Re^2: Untaint a string match, regular expression. by hdb (Monsignor) on May 18, 2015 at 07:53 UTC
Depending on your application, I would really only let the user input wildcards like "*" and "?" (DOS style), not regular expressions. Alternatively, if the data to be matched comes from a database SQL style wildcards could be an alternative. Everything else will be escaped. This is easy to implement and will not create trouble with security or memory. It will also go a long way, probably for most applications. If you look at PM's Super Search, it works without any regular expressions but is still quite powerful.	[reply]
Re: Untaint a string match, regular expression. by Anonymous Monk on May 17, 2015 at 23:22 UTC
Either you let a user specify a regex , or you don't Either you trust the user or you don't re If you have questions about pcre, start with its docs But, if you cant even use the correct vocabulary, nothing is safe	[reply]
Re^2: Untaint a string match, regular expression. by cheako (Beadle) on May 18, 2015 at 00:14 UTC
I think it would be trivial to write a subroutine in perl that takes a string and splits it on '^', '$', '.*' and then recombines the string as a regex, with the rest of the string regex escaped. The above would allow for user to specify a regex and be secure for untrusted users. This can also be expanded to include many more features.	[reply]
Re^3: Untaint a string match, regular expression. by Anonymous Monk on May 18, 2015 at 00:17 UTC
Ok, if you think so, what is your question?	[reply]
Re^4: Untaint a string match, regular expression. by cheako (Beadle) on May 18, 2015 at 00:33 UTC
Re^5: Untaint a string match, regular expression. by Anonymous Monk on May 18, 2015 at 00:48 UTC
Some notes below your chosen depth have not been shown here
Re: Untaint a string match, regular expression. by Anonymous Monk on May 18, 2015 at 01:17 UTC
linkdump The Perl Regex Tester / Perl Regex Tester Re^3: My Favourite Regex Tools (Was: Parsing a Variable Format String) Mastering Regular Expressions perlrequick perlretut perlre YAPE::Regex::Explain GraphViz::Regex The Regex Coach (Win32 only, IIRC) Regex Arcana txt2re: headache relief for programmers :: regular expression generator Regex tester Perl 5.10 Advanced Regular Expressions Online Regular Expression Analyzer, courtesy Ratazong (2010-08-11) wxPPIxregexplain.pl ppixregexplain.pl rxrx http://regex101.com/ - Online regex tester and debugger: JavaScript, Python, PHP, and PCRE Security Briefs - Regular Expression Denial of Service Attacks and Defenses http://stackoverflow.com/questions/4289923/in-which-languages-is-it-a-security-hole-to-use-user-supplied-regular-expression Runaway Regular Expressions: Catastrophic Backtracking	[reply]
Re: Untaint a string match, regular expression. by Anonymous Monk on May 18, 2015 at 21:03 UTC
Also check this thread: Stop runaway regex / and here. Perl handling of alarms is at times inadequate, alas.	[reply]
Re^2: Untaint a string match, regular expression. by cheako (Beadle) on May 19, 2015 at 00:37 UTC
Thank you for this, RE2 looks like a great option. Make sure at add (-strict => 1) to your use statement or re::engine::RE2 will fall back to perl's re.	[reply]
Re^3: Untaint a string match, regular expression. by cheako (Beadle) on May 19, 2015 at 02:59 UTC
The following is a quote from junyer, owner of the project on github. RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget � failing gracefully when exhausted � and they avoid stack overflow by eschewing recursion.	[reply]