comment on

Hello SuicideJunkie, and thanks for the answer. Unfortunately, I’m still confused. :-(

From your explanation, I would expect that making the whitespace match non-greedy would prevent the intermediate newline(s) from being eliminated. But it doesn’t (see below). Here is my current understanding (obviously flawed) of what should happen:

^ and $ are zero-width assertions, so when they feature in a match the newline they follow/preceed is not substituted. For example:
```
18:14 >perl -wE "my $s = qq[\n\n\n]; my $t = $s =~ s{$}{}gmr; say $s e
+q $t;"
1

18:14 >
[download]
```
\s*? matches zero or more whitespace characters (including newline) non-greedily.
With the /g modifier in effect, whenever a match succeeds the regex engine begins looking for the next match one character past where the last successful match began.

Given these assumptions, I would expect that the regex /^\s*?$/ would match the string "a\n\n\nb" as follows: First, ^ matches after the first newline. Since \s*? is non-greedy, the regex engine looks for the shortest match satisfying \s*?$, and finds it in the zero-length string between the first two newlines. This it replaces with another zero-length string. It then starts looking for the next match with ^ matching after the second newline. Again, it finds and replaces a zero-length string. Finaly, ^ matches after the final newline, but no match is found. Result: the string is unchanged. However:

#! perl
use strict;
use warnings;

my $s = "a\n\n\nb";
my $t = $s =~ s{^\s*?$}{}gmr;

printf "%s\n", $s eq $t ? 'success' : 'fail';
print  ">$s<\n";
print  "[$t]\n";
[download]

Output:

18:29 >perl 902_SoPW.pl
fail
>a


b<
[a

b]

18:29 >
[download]

One of the newlines is being deleted, so my understanding must be wrong somewhere.

I did try adding use re 'debug'; but I’m only just learning to interpret the output. I think the relevant part is:

...
Guessed: match at offset 0
   2 <a%n> <%n%nb>           |  1:MBOL(2)
   2 <a%n> <%n%nb>           |  2:MINMOD(3)
   2 <a%n> <%n%nb>           |  3:STAR(5)
   2 <a%n> <%n%nb>           |  5:  MEOL(6)
   2 <a%n> <%n%nb>           |  6:  END(0)
Match possible, but length=0 is smaller than requested=1, failing!
                                  POSIXD[\s] can match 1 times out of 
+1...
   3 <a%n%n> <%nb>           |  5:  MEOL(6)
   3 <a%n%n> <%nb>           |  6:  END(0)
Match successful!
...
[download]

which seems to show that the match I would expect (empty line) is rejected, but I don’t know why it is.

What am I missing?

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

In reply to Re^5: regex doubt on excluding by Athanasius
in thread regex doubt on excluding by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.