in reply to regex doubt on excluding

Sorry, but the code shown is not removing newlines, as is easily demonstrated:

18:23 >perl -wE "my $s = qq[abc\n\t \ndef\n \ngh]; $s =~ s/^\s*$/X/m +g; say $s;" abc X def X gh 18:24 >

That’s because the /m modifier lets ^ and $ match at the beginning and end of each line within the string (see perlre#Modifiers). What the code does is to remove any whitespace from an otherwise empty line, i.e. whitespace is removed if and only if the whitespace is the only thing between two newlines (or between the beginning of the string and the first newline, or between the last newline and the end of the string). Is this what was intended? Or were you wanting to remove all whitespace except newlines themselves? If the latter, Laurent_R’s approach is what you want:

18:24 >perl -wE "my $s = qq[abc\n\t \nd\tef\n \ngh]; $s =~ s/[ \t]+/ +X/mg; say $s;" abc X dXef X gh 18:39 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: regex doubt on excluding
by Anonymous Monk on Apr 21, 2014 at 08:47 UTC
    Preserve multiple empty lines. (but without white-spaces)

      Ok, I understand now, and it seems I spoke too soon: the original code is removing some newlines, since it reduces a sequence of successive newlines to a single one.

      I don’t understand how this is working. From perlre#Regular-Expressions:

      By default, the "^" character is guaranteed to match only the beginning of the string, the "$" character only the end (or before the newline at the end), and Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by "^" or "$". You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any newline within the string (except if the newline is the last character in the string), and "$" will match before any newline. At the cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator.

      — but I don’t see how this explains the behaviour we are seeing?

      Can someone please explain what the regex is doing here?

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Well, the ^ could match the newline after "def", and the $ could match the newline before 'gh'. and all the newlines between those two are greedily accepted by the \s+ and thus eliminated.

        ^ matching *after* a newline means the first newline would not be included and eliminated. $ matching before a newline means the last newline is not eliminated either.

        Those two newlines make for one blank line between the non-blank lines, and any excess whitespace including newlines between them is removed.

        Thank you Athanasius.

        I got it worked with $string =~ s/\s*?\n/\n/mg;