Hello SuicideJunkie, and thanks for the answer. Unfortunately, I’m still confused. :-(

From your explanation, I would expect that making the whitespace match non-greedy would prevent the intermediate newline(s) from being eliminated. But it doesn’t (see below). Here is my current understanding (obviously flawed) of what should happen:

Given these assumptions, I would expect that the regex /^\s*?$/ would match the string "a\n\n\nb" as follows: First, ^ matches after the first newline. Since \s*? is non-greedy, the regex engine looks for the shortest match satisfying \s*?$, and finds it in the zero-length string between the first two newlines. This it replaces with another zero-length string. It then starts looking for the next match with ^ matching after the second newline. Again, it finds and replaces a zero-length string. Finaly, ^ matches after the final newline, but no match is found. Result: the string is unchanged. However:

#! perl use strict; use warnings; my $s = "a\n\n\nb"; my $t = $s =~ s{^\s*?$}{}gmr; printf "%s\n", $s eq $t ? 'success' : 'fail'; print ">$s<\n"; print "[$t]\n";

Output:

18:29 >perl 902_SoPW.pl fail >a b< [a b] 18:29 >

One of the newlines is being deleted, so my understanding must be wrong somewhere.

I did try adding use re 'debug'; but I’m only just learning to interpret the output. I think the relevant part is:

... Guessed: match at offset 0 2 <a%n> <%n%nb> | 1:MBOL(2) 2 <a%n> <%n%nb> | 2:MINMOD(3) 2 <a%n> <%n%nb> | 3:STAR(5) 2 <a%n> <%n%nb> | 5: MEOL(6) 2 <a%n> <%n%nb> | 6: END(0) Match possible, but length=0 is smaller than requested=1, failing! POSIXD[\s] can match 1 times out of +1... 3 <a%n%n> <%nb> | 5: MEOL(6) 3 <a%n%n> <%nb> | 6: END(0) Match successful! ...

which seems to show that the match I would expect (empty line) is rejected, but I don’t know why it is.

What am I missing?

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re^5: regex doubt on excluding by Athanasius
in thread regex doubt on excluding by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.