in reply to regexp: removing extra whitespace
s/\s(?<![ \n])//g; s/ \K +//g; s/\n\n\K\n+//g;
The order of the first two matters (e.g. foo{space}{tab}{space}bar). I gave them in the same order you requested them.
I find it odd that foo{tab}bar should become foobar. One usually wants foo{space}bar. To get the latter,
s/(?:\s(?<![ \n]))+/ /g; s/\n\n\K\n+//g;
\s(?<![ \n])
is currently equivalent to
[\x{0009}\x{000B}-\x{000D}\x{0085}\x{00A0}\x{1680}\x{180E}\x{2000}-\x{ +200A}\x{2028}\x{2029}\x{202F}\x{205F}\x{3000}]
or sometimes the buggy
[\x{0009}\x{000B}-\x{000D}\x{1680}\x{180E}\x{2000}-\x{200A}\x{2028}\x{ +2029}\x{202F}\x{205F}\x{3000}]
Update: While U+000B is considered a space by Unicode and \p{Space}, it's not considered a space by \s for historical reasons.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: regexp: removing extra whitespace
by Eliya (Vicar) on Nov 04, 2011 at 19:52 UTC | |
by ikegami (Patriarch) on Nov 04, 2011 at 20:57 UTC | |
by Eliya (Vicar) on Nov 04, 2011 at 21:28 UTC | |
by ikegami (Patriarch) on Nov 05, 2011 at 01:11 UTC | |
by perlmax (Initiate) on Nov 04, 2011 at 21:37 UTC | |
by GrandFather (Saint) on Nov 04, 2011 at 21:41 UTC | |
by ikegami (Patriarch) on Nov 05, 2011 at 01:23 UTC | |
by perlmax (Initiate) on Nov 04, 2011 at 23:24 UTC | |
|
Re^2: regexp: removing extra whitespace
by perlmax (Initiate) on Nov 04, 2011 at 19:43 UTC | |
by ikegami (Patriarch) on Nov 04, 2011 at 20:52 UTC |