Hello SuicideJunkie, and thanks for the answer. Unfortunately, I’m still confused. :-(
From your explanation, I would expect that making the whitespace match non-greedy would prevent the intermediate newline(s) from being eliminated. But it doesn’t (see below). Here is my current understanding (obviously flawed) of what should happen:
^ and $ are zero-width assertions, so when they feature in a match the newline they follow/preceed is not substituted. For example:
18:14 >perl -wE "my $s = qq[\n\n\n]; my $t = $s =~ s{$}{}gmr; say $s e +q $t;" 1 18:14 >
\s*? matches zero or more whitespace characters (including newline) non-greedily.
Given these assumptions, I would expect that the regex /^\s*?$/ would match the string "a\n\n\nb" as follows: First, ^ matches after the first newline. Since \s*? is non-greedy, the regex engine looks for the shortest match satisfying \s*?$, and finds it in the zero-length string between the first two newlines. This it replaces with another zero-length string. It then starts looking for the next match with ^ matching after the second newline. Again, it finds and replaces a zero-length string. Finaly, ^ matches after the final newline, but no match is found. Result: the string is unchanged. However:
#! perl use strict; use warnings; my $s = "a\n\n\nb"; my $t = $s =~ s{^\s*?$}{}gmr; printf "%s\n", $s eq $t ? 'success' : 'fail'; print ">$s<\n"; print "[$t]\n";
Output:
18:29 >perl 902_SoPW.pl fail >a b< [a b] 18:29 >
One of the newlines is being deleted, so my understanding must be wrong somewhere.
I did try adding use re 'debug'; but I’m only just learning to interpret the output. I think the relevant part is:
... Guessed: match at offset 0 2 <a%n> <%n%nb> | 1:MBOL(2) 2 <a%n> <%n%nb> | 2:MINMOD(3) 2 <a%n> <%n%nb> | 3:STAR(5) 2 <a%n> <%n%nb> | 5: MEOL(6) 2 <a%n> <%n%nb> | 6: END(0) Match possible, but length=0 is smaller than requested=1, failing! POSIXD[\s] can match 1 times out of +1... 3 <a%n%n> <%nb> | 5: MEOL(6) 3 <a%n%n> <%nb> | 6: END(0) Match successful! ...
which seems to show that the match I would expect (empty line) is rejected, but I don’t know why it is.
What am I missing?
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
In reply to Re^5: regex doubt on excluding
by Athanasius
in thread regex doubt on excluding
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |