Only1KW has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to write a regex replace to be run on a line containing a newline and I want the newline character to not be replaced. It seems like \s is matching the newline character, and I have no idea why or how to prevent that. I'm not using '$' or '.' in my regex, so I am not using a 'm' or 's' modifier. Any ideas?

Example script:

#!/usr/bin/perl use strict; use warnings; my $line = "Hi\n"; print length($line) . "\n"; $line =~ s/Hi\s*//; print length($line) . "\n";

Output I get:

3 0
I'm running v5.18.2 if it matters.

Replies are listed 'Best First'.
Re: \s matches newline in regex? ([^\S\n])
by tye (Sage) on Jul 22, 2015 at 16:56 UTC
Re: \s matches newline in regex?
by toolic (Bishop) on Jul 22, 2015 at 16:32 UTC
    It seems like \s is matching the newline character,
    Yes.

    Tip #9 from the Basic debugging checklist: YAPE::Regex::Explain

    The regular expression: (?-imsx:Hi\s*) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- Hi 'Hi' ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    This works for your simple input:
    $line =~ s/Hi[^\n]*//;
      So if \s matches more than a space, how do I match just a space? While I agree your suggestion works for my simple example, it won't work for my actual workload.

        You can use [ ], or \x{20} or \U{0020} I guess.

        If you want just a space use just a space! If you want to match everything except some specific white space characters use a negated character class with \S and the other white space characters you want to keep:

        s/Hi[^\S\n\r\f]+//g

        for example.

        Premature optimization is the root of all job security
Re: \s matches newline in regex?
by stevieb (Canon) on Jul 22, 2015 at 16:41 UTC

    You can also use a positive lookahead in the regex to avoid replacing the newline char.

    $line =~ s/Hi\s*(?=\n)//;

      If there were repeated, contiguous newlines, that would only avoid replacing the final newline.

      c:\@Work\Perl\monks>perl -wMstrict -le "my $line = qq{HeHi\n\n\nHo}; print qq{[$line]}; ;; $line =~ s/Hi\s*(?=\n)//; print qq{[$line]}; " [HeHi Ho] [He Ho]


      Give a man a fish:  <%-(-(-(-<

        Yes, I'm aware of that, but it would have been prudent for me to have pointed that out.

        Thanks AnomalousMonk.

        -stevieb

Re: \s matches newline in regex?
by kroach (Pilgrim) on Jul 22, 2015 at 17:41 UTC
    You can use non-greedy matching and $ to match whitespace up until a newline.
    $line =~ s/Hi\s*?$//m;