Adam, that's not quite accurate about the '?'. If a question mark follows a quantifier (*?, +?, {min, max}? or ??) in a regex, it makes it "non-greedy". Consider the following code.
# 3 spaces, a tab, 3 more spaces, another tab and 3 more spaces (repre
+sent by chr() for clarity)
$test = chr(32)x3 . chr(9) . chr(32)x3 . chr(9) . chr(32)x3;
($first = $1, $second = $2) if $test =~ /(\s*)\t(\s*)/;
In this case, the first (\s*) will be greedy and attempt to match as many characters as possible. $first will contain 3 spaces, a tab, and 3 more spaces. $second will contain 3 spaces. However, by adding the question mark, we make it non-greedy.
($first = $1, $second = $2) if $test =~ /(\s*?)\t(\s*)/;
This means that (\s*?) attempt the smallest match possible that satisfies that above regex. In this case, $first contains 3 spaces and $second contains 3 spaces, a tab, and 3 more spaces. The '?' does not mean "aka zero".
Incidentally, most regexes ending in (.*?)$/ (like the one in the original post) have a superfluous ? because there is no way to make that statement non-greedy, since it's forced to match to the end. | [reply] [d/l] [select] |
You are correct, perhaps I should have been more clear. The regex that we were discussing ends with \s*?(.*?)$/; which is somewhat different from your example. Here it is matching the fewest spaces followed by the fewest 'anything but newlines' to the end of the string. Since the . will match white space, the \s*? will match nothing. Always. But thank you for your clarification of the more generic case.
| [reply] |
Good point! I confess that I hadn't seen that. I'm not exactly a slouch when it comes to regex but it seems like every week I come across a new case whose functionality I'll miss if I don't take a second to look at it more carefully. Gotta love regex!
| [reply] |
\s*? is set that way on purpose in case the words have no whitespace between them. | [reply] |
The question mark in \s*? is not necessary if you are doing that "in case the words have no whitespace between them." The * quantifier matches zero or more of whatever it is quantifying.
$test = "az";
print "Good\n" if $test =~ /a\s*z/;
The above regex sees an 'a', followed by zero spaces, followed by a 'z'. Since this matches the value of $test, it prints "Good\n".
Cheers! | [reply] [d/l] |