That was fine. But I wondered, how should "\t\n\n" or "\n\n\t" be handled? Well, I came up with a truly hideous, yet truly working, regex. One regex. With no /e modifier. One catch: variable-width look-behinds. How did I get around it? Why, sexeger of course. This was another use of that technique to solve an interesting problem. You see, "\n\n\n" should match the "\n\n" as one unit (leaving it intact) but then "\n" as a chunk to be turned into a single space. However, "\n\n\n\n" should be seen as two "\n\n" units.s/(\s+)/$1 eq "\n\n" ? $1 : " "/eg;
The problem is that when I come across a newline, I need to see if it is preceded by an even number of newlines. Ordinarily, I'd say /\n(?=(?:\n\n)*(?!\n))/ to denote "a newline followed by an even number of newlines". Sadly, I can't use that for look-behind: /(?<=(?<!\n)(?:\n\n)*)\n/ doesn't work because it's variable width. Solution? Reverse it.
So I offer this regex:
Whew.($_ = reverse) =~ s{ (?: [\r\t\f ]+ # non-\n whitespace | # OR (?<!\n) # not preceded by a \n \n # match a \n (?= # that's followed by... (?:\n\n)* (?!\n) # an even number of \n's ) )+ # one or more times }{ }xg; # turn it into a single space $_ = reverse;
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re (tilly) 1: (Regex Madness) And you thought whitespace was easy.
by tilly (Archbishop) on Aug 15, 2001 at 13:00 UTC | |
|
Re: (Regex Madness) And you thought whitespace was easy.
by runrig (Abbot) on Aug 15, 2001 at 08:49 UTC | |
by japhy (Canon) on Aug 15, 2001 at 09:24 UTC | |
by runrig (Abbot) on Aug 15, 2001 at 11:32 UTC | |
|
Re: (Regex Madness) And you thought whitespace was easy.
by John M. Dlugosz (Monsignor) on Aug 15, 2001 at 10:03 UTC | |
|
(tye)Re: (Regex Madness) And you thought whitespace was easy.
by tye (Sage) on Aug 15, 2001 at 22:38 UTC | |
by japhy (Canon) on Aug 16, 2001 at 01:56 UTC |