in reply to Regexp oddity

It worked for me:
C:\>perl -We "$_='blah, blah2';$OPnot='-';$OPor=',';$OPand='\+'; /^(.*)\s*?($OPnot|$OPor|$OPand)\s*?(.*?)$/; print qq[1='$1', 2='$2', 3 +='$3']" 1='blah', 2=',', 3=' blah2'
(WinNT, ActiveState 5.6)

BTW: If you want \s*? to match anything, you should remove the question mark. *? will happily match the gap between chars.
(* matches zero or more, but ? tells it to match as little as possible, aka zero.)

Replies are listed 'Best First'.
RE: Re: Regexp oddity
by Ovid (Cardinal) on Jun 21, 2000 at 05:04 UTC
    Adam, that's not quite accurate about the '?'. If a question mark follows a quantifier (*?, +?, {min, max}? or ??) in a regex, it makes it "non-greedy". Consider the following code.
    # 3 spaces, a tab, 3 more spaces, another tab and 3 more spaces (repre +sent by chr() for clarity) $test = chr(32)x3 . chr(9) . chr(32)x3 . chr(9) . chr(32)x3; ($first = $1, $second = $2) if $test =~ /(\s*)\t(\s*)/;
    In this case, the first (\s*) will be greedy and attempt to match as many characters as possible. $first will contain 3 spaces, a tab, and 3 more spaces. $second will contain 3 spaces. However, by adding the question mark, we make it non-greedy.
    ($first = $1, $second = $2) if $test =~ /(\s*?)\t(\s*)/;
    This means that (\s*?) attempt the smallest match possible that satisfies that above regex. In this case, $first contains 3 spaces and $second contains 3 spaces, a tab, and 3 more spaces. The '?' does not mean "aka zero".

    Incidentally, most regexes ending in (.*?)$/ (like the one in the original post) have a superfluous ? because there is no way to make that statement non-greedy, since it's forced to match to the end.

      You are correct, perhaps I should have been more clear. The regex that we were discussing ends with \s*?(.*?)$/; which is somewhat different from your example. Here it is matching the fewest spaces followed by the fewest 'anything but newlines' to the end of the string. Since the . will match white space, the \s*? will match nothing. Always. But thank you for your clarification of the more generic case.
        Good point! I confess that I hadn't seen that. I'm not exactly a slouch when it comes to regex but it seems like every week I come across a new case whose functionality I'll miss if I don't take a second to look at it more carefully. Gotta love regex!
RE: Re: Regexp oddity
by daemon23 (Initiate) on Jun 21, 2000 at 03:51 UTC
    \s*? is set that way on purpose in case the words have no whitespace between them.
      The question mark in \s*? is not necessary if you are doing that "in case the words have no whitespace between them." The * quantifier matches zero or more of whatever it is quantifying.
      $test = "az"; print "Good\n" if $test =~ /a\s*z/;
      The above regex sees an 'a', followed by zero spaces, followed by a 'z'. Since this matches the value of $test, it prints "Good\n".

      Cheers!