mdi has asked for the wisdom of the Perl Monks concerning the following question:

Here's the regex I'm using:

s/(?<=\|)\.?\s*(?=\||$)/\\N/g;

The problem I'm seeing is that if a line in a file ends with '| ', the space is correctly replaced with '\N', but the newline is also sucked up in the substitution causing the line below to now become part of the line above. I've checked that there is actually a newline in the file.

I'm sure it's probably something simple, but I can seem to figure out what's causing this behavior.

Replies are listed 'Best First'.
Re: Regex substituion eating newlines
by Roy Johnson (Monsignor) on May 10, 2005 at 11:41 UTC
    Get rid of the "|$" portion. That matches the end of line, and the newline is whitespace at the end of a line. Or, if you do want spaces (but not newline) removed, don't use \s. Use a literal space character in your regex. Or [ \t] to match space and tab. Whatever you want to match.

    tlm correctly pointed out to me that there's more to the replacement than whitespace, so getting rid of |$ will prevent a desired replacement.


    Caution: Contents may have been coded under pressure.
      Thanks. I didn't realize newlines were considered whitespace.
Re: Regex substituion eating newlines
by tlm (Prior) on May 10, 2005 at 12:06 UTC

    You could replace "|$" with "|\n\z"; the "\z" matches the end of string, irrespective of newlines.

    Update: originally I had "|\n\z" as the proposed replacement for "|$", but right after posting I realized that the preceding "\s*" made the \n superfluous.

    Update 2: Sheesh. My original was right after all; leaving out the "\n" would result in the same problem that the OP posted about in the first place. Hypoglycemia is setting in; time to get me some breakfast.

    the lowliest monk

Re: Regex substituion eating newlines
by thcsoft (Monk) on May 10, 2005 at 11:37 UTC
    is "\N" a typo?

    language is a virus from outer space.