Re: Regexp oddity
by chromatic (Archbishop) on Jun 21, 2000 at 07:24 UTC
|
If you don't have the end-of-line anchor ($) in your regex in the program, $3 will contain nothing. Otherwise, it will contain " blah2".
If you make the last parenthesized match non-greedy by removing the trailing question mark, you can leave out the final $.
I recommend using a line like the following to show what you've captured, just in case you have whitespace:
print "1: ->$1<-\n2: ->$2<-\n3: ->$3<-\n"; | [reply] [d/l] |
Re: Regexp oddity
by Adam (Vicar) on Jun 21, 2000 at 03:37 UTC
|
C:\>perl -We "$_='blah, blah2';$OPnot='-';$OPor=',';$OPand='\+';
/^(.*)\s*?($OPnot|$OPor|$OPand)\s*?(.*?)$/; print qq[1='$1', 2='$2', 3
+='$3']"
1='blah', 2=',', 3=' blah2'
(WinNT, ActiveState 5.6)
BTW: If you want \s*? to match anything, you should remove the question mark. *? will happily match the gap between chars. (* matches zero or more, but ? tells it to match as little as possible, aka zero.) | [reply] [d/l] |
|
|
Adam, that's not quite accurate about the '?'. If a question mark follows a quantifier (*?, +?, {min, max}? or ??) in a regex, it makes it "non-greedy". Consider the following code.
# 3 spaces, a tab, 3 more spaces, another tab and 3 more spaces (repre
+sent by chr() for clarity)
$test = chr(32)x3 . chr(9) . chr(32)x3 . chr(9) . chr(32)x3;
($first = $1, $second = $2) if $test =~ /(\s*)\t(\s*)/;
In this case, the first (\s*) will be greedy and attempt to match as many characters as possible. $first will contain 3 spaces, a tab, and 3 more spaces. $second will contain 3 spaces. However, by adding the question mark, we make it non-greedy.
($first = $1, $second = $2) if $test =~ /(\s*?)\t(\s*)/;
This means that (\s*?) attempt the smallest match possible that satisfies that above regex. In this case, $first contains 3 spaces and $second contains 3 spaces, a tab, and 3 more spaces. The '?' does not mean "aka zero".
Incidentally, most regexes ending in (.*?)$/ (like the one in the original post) have a superfluous ? because there is no way to make that statement non-greedy, since it's forced to match to the end. | [reply] [d/l] [select] |
|
|
You are correct, perhaps I should have been more clear. The regex that we were discussing ends with \s*?(.*?)$/; which is somewhat different from your example. Here it is matching the fewest spaces followed by the fewest 'anything but newlines' to the end of the string. Since the . will match white space, the \s*? will match nothing. Always. But thank you for your clarification of the more generic case.
| [reply] |
|
|
|
|
\s*? is set that way on purpose in case the words have no whitespace between them.
| [reply] |
|
|
The question mark in \s*? is not necessary if you are doing that "in case the words have no whitespace between them." The * quantifier matches zero or more of whatever it is quantifying.
$test = "az";
print "Good\n" if $test =~ /a\s*z/;
The above regex sees an 'a', followed by zero spaces, followed by a 'z'. Since this matches the value of $test, it prints "Good\n".
Cheers! | [reply] [d/l] |
Re: Regexp oddity
by daemon23 (Initiate) on Jun 21, 2000 at 20:49 UTC
|
My thanks to everyone who wrote back on this--I finally
figured it out.
$OPand was set to '\s+', as this was how the input string is
set. This is also why I was using \s*?, believing it would
capture any whitespace in the case of the $OPor or $OPnot
separators. However, perl was evaluating "blah, blah2" and
returning $1 = 'blah,', $2 = ' ', and $3 = 'blah2'. The
reason I erroneously assumed the regexp was destroying 'blah2'
is the script loops, evaluating $1 as the test string.
The handler for ($2 =~ /^\s+$/) was written
incorrectly, so the script just skipped over the first
iteration. It worked on the second iteration, however,
evaluating "blah,", and thus finding $1 = "blah", $2 = ",", and $3 = "".
Again, thanks for the pointers--they definitely helped me figure out what I'd done incorrectly.
| [reply] [d/l] |
Re: Regexp oddity
by btrott (Parson) on Jun 21, 2000 at 03:41 UTC
|
What are $OPnot and $OPand equal to when you use the
regexp? That will make a difference. | [reply] |