Validating Regular Expression

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Validating Regular Expression by davido (Cardinal) on Feb 15, 2011 at 16:29 UTC
Eval is great for getting perl to tell you if something is technically legal or not. But I have a concern. The task is a little more complex than that. You will have to prevent the `(?{code})` construct from finding its way into your user's input, lest they manage to inject some unsavory tidbit into your system. That's just the start. Will your users have the ability to input RE's that are expressed in /x terms? (In other words, where whitespace is not relevant.) They could always use `(?x:.....)`, or any of the other legal options to sort of control the RE's destiny, possibly in a way that you weren't intending. I think that the greater task is not in validating the RE's viability, so much as evaluating its security risk. As for validating its viability, eval isn't such a bad tool. You would be able to let perl tell you if there's a problem. But perl cannot know if there's a security breech. I don't know all the possible risks that you should test for. But at minimum disallowing code blocks would be a start. Dave	[reply] [d/l] [select]
Re^2: Validating Regular Expression by ikegami (Patriarch) on Feb 15, 2011 at 17:46 UTC
As mentioned, patterns that execute arbitrary code are disabled by default when interpolating. There is one thing to keep in mind though: It's easy to construct a pattern that will take longer than the life of the universe to (fail to) match.	[reply]
Re^2: Validating Regular Expression by kennethk (Abbot) on Feb 15, 2011 at 16:42 UTC
I could just be timid, but I would not trust myself to filter this type of input from an untrusted source. Nefarious people are sometimes smarter than me and always have more time and motivation, so I would not put this tool anywhere they could use it to elevate permissions. Sure you could swap my routine above to `sub validate{ my $pat = shift; return if $pat =~ /\(\?{1,2}\{/; return eval{qr/$pat/}; }` [download] to filter out code evaluation and pattern code expressions (A bit of magic: executing Perl code in a regular expression), but I'm sure I'd miss some clever escape or exploit of the regex engine. This is the sort of thing where the feature is not worth the effort to secure it properly.	[reply] [d/l]
Re^3: Validating Regular Expression by Anonymous Monk on Feb 15, 2011 at 17:54 UTC
Hi, I'm not too worried about them trying to attack using this. It's their own PC it's running on! Would be a pain if it was a web application or similar but this is just a local Tk app. If they can escalate privileges in any way then that's a fault in the OS or the IT setup. Either way, Not My Problem (Tm)!! Graham	[reply]
Re^2: Validating Regular Expression by JavaFan (Canon) on Feb 15, 2011 at 16:53 UTC
You will have to prevent the (?{code}) construct from finding its way into your user's input, lest they manage to inject some unsavory tidbit into your system. That's not more complicated than not typing `use re 'eval';`. That is, by default, Perl doesn't honour `(?{code})` constructs in interpolated code.	[reply] [d/l] [select]
Re^2: Validating Regular Expression by Anonymous Monk on Feb 15, 2011 at 16:37 UTC
Hi, Thanks for the answers. I'm not too worried about security, it's a real application running on the users on PC (Tk GUI). Only thing the user's going to be able to break is their own PC, that's between them and their IT guys!! If it was for a website or similar public access I'd be more concerned. Graham	[reply]
Re: Validating Regular Expression by kennethk (Abbot) on Feb 15, 2011 at 16:25 UTC
Assuming you don't want to implement a regular expression interpreter, then I'd bet your code is fairly close to optimal. You could simplify your test to `use strict; use warnings; for ('a', '') { print validate($_) ? "Pass: '$_'\n" : "Fail: '$_'\n"; } sub validate{ my $pat = shift; return eval{qr/$pat/}; }` [download] where it returns the compiled regular expression if it succeeds (TRUE) or `undef` if it fails (FALSE). (~~In general~~ IMHO, it's a good idea to just `return;` on failure, since `return 0;` ~~will evaluate to TRUE in list context.~~ will return a non-empty list if the sub is evaluated in list context) Blah blah, security, blah blah, trusted source, blah blah, injection attack, blah blah, perlsec. Update: In deference to ikegami's comment, I have revised the above; however I would point out that Perl Best Practices agrees with me (point 124 - Use a bare return to return failure). I think we can both agree that the interface you specify should depend on how you are going to use it.	[reply] [d/l] [select]
Re^2: Validating Regular Expression by ikegami (Patriarch) on Feb 15, 2011 at 17:06 UTC
In general, it's a good idea to just return; on failure, since return 0; will evaluate to TRUE in list context. First, that's not true. You can't test the truthfulness of a something evaluated in list context, so how can it evaluate to true? I think you are talking about list assignment in scalar context. `sub is_valid { 0 } is_valid() or die "invalid"; # Ok (dies) my $invalid = is_valid() or die "invalid"; # Ok (dies) my ($invalid) = is_valid() or die "invalid"; # XXX (Doesn't die!)` [download] The last is purely the coder doing something wrong. Secondly, I disagree. A boolean is expected, so return that. Otherwise, you create a function that needlessly returns differently when evaluated in scalar and list contexts. You create a function that returns something differently than expected (and documented?). You rely on the caller to create a boolean from the value you do return. For example, you are forcing people to write `%data = ( foo => !!is_foo(), bar => !!is_bar(), );` [download] instead of `%data = ( foo => is_foo(), bar => is_bar(), );` [download] Returning an empty list would cause the buggy assignment in scalar context to DWIM, but it breaks valid code. Update: Improved list assignment in scalar context example.	[reply] [d/l] [select]
Re^2: Validating Regular Expression by ikegami (Patriarch) on Feb 15, 2011 at 17:35 UTC
I think we can both agree that the interface you specify should depend on how you are going to use it. That's entirely my point. One would use the result of `validate` as a boolean, so it should return a boolean. You suggested that it should return something that needs to be converted into a boolean first. Use a bare return to return failure Return an empty list can be useful to signal failure where a false value could be returned. `sub get { $engine->connect() or return (); return $engine->get(); } my $ok = ($data) = get();` [download] That's not relevant here.	[reply] [d/l] [select]
Re^3: Validating Regular Expression by kennethk (Abbot) on Feb 15, 2011 at 17:53 UTC
It only needs to be converted to a boolean first if it is used in a list assignment. Any usage of the sub in scalar context will behave as expected and as the original spec. We are both making assumptions about how "[o]ne would use the result", and any conclusions are subjective and academic.	[reply]
Re^4: Validating Regular Expression by ikegami (Patriarch) on Feb 15, 2011 at 18:15 UTC