in reply to Re: Regexp oddity
in thread Regexp oddity

Adam, that's not quite accurate about the '?'. If a question mark follows a quantifier (*?, +?, {min, max}? or ??) in a regex, it makes it "non-greedy". Consider the following code.
# 3 spaces, a tab, 3 more spaces, another tab and 3 more spaces (repre +sent by chr() for clarity) $test = chr(32)x3 . chr(9) . chr(32)x3 . chr(9) . chr(32)x3; ($first = $1, $second = $2) if $test =~ /(\s*)\t(\s*)/;
In this case, the first (\s*) will be greedy and attempt to match as many characters as possible. $first will contain 3 spaces, a tab, and 3 more spaces. $second will contain 3 spaces. However, by adding the question mark, we make it non-greedy.
($first = $1, $second = $2) if $test =~ /(\s*?)\t(\s*)/;
This means that (\s*?) attempt the smallest match possible that satisfies that above regex. In this case, $first contains 3 spaces and $second contains 3 spaces, a tab, and 3 more spaces. The '?' does not mean "aka zero".

Incidentally, most regexes ending in (.*?)$/ (like the one in the original post) have a superfluous ? because there is no way to make that statement non-greedy, since it's forced to match to the end.

Replies are listed 'Best First'.
RE: RE: Re: Regexp oddity
by Adam (Vicar) on Jun 21, 2000 at 20:57 UTC
    You are correct, perhaps I should have been more clear. The regex that we were discussing ends with \s*?(.*?)$/; which is somewhat different from your example. Here it is matching the fewest spaces followed by the fewest 'anything but newlines' to the end of the string. Since the . will match white space, the \s*? will match nothing. Always. But thank you for your clarification of the more generic case.
      Good point! I confess that I hadn't seen that. I'm not exactly a slouch when it comes to regex but it seems like every week I come across a new case whose functionality I'll miss if I don't take a second to look at it more carefully. Gotta love regex!