in reply to Re^2: regexp: removing extra whitespace
in thread regexp: removing extra whitespace
Yes, [^\S \n] and \s(?<![ \n]) are equivalent. Well, should be.
Just tried it with my perl (v5.12.2), and [^\S \n] doesn't match \x{0085} and \x{00A0}
Sometimes it won't because of a bug, but that applies to both [^\S \n] and \s(?<![ \n]). See Re: Can I change \s?.
5.12 seems to have another problem on top of that.
5.12:
$ perl -le'print "\x{00A0}" =~ /[^\S \n]/ ?1:0;' 0 # Expected $ perl -E'say "\x{00A0}" =~ /[^\S \n]/ ?1:0;' 0 # Feature unicode_strings doesn't fix regexes yet. $ perl -le'print "\N{U+00A0}" =~ /[^\S \n]/ ?1:0;' 0 # Surprised! $ perl -le'print "\x{2660}\x{00A0}" =~ /[^\S \n]/ ?1:0;' 0 # Surprised!
(Last two are really the same.)
Now with what should be an equivalent pattern.
$ perl -le'print "\x{00A0}" =~ /\s(?<![ \n])/ ?1:0;' 0 # Expected $ perl -E'say "\x{00A0}" =~ /\s(?<![ \n])/ ?1:0;' 0 # Feature unicode_strings doesn't fix regexes yet. $ perl -le'print "\N{U+00A0}" =~ /\s(?<![ \n])/ ?1:0;' 1 # \N always returns an upgraded string. $ perl -le'print "\x{2660}\x{00A0}" =~ /\s(?<![ \n])/ ?1:0;' 1 # Forces the use of an upgraded string.
5.14:
$ perl -le'print "\x{00A0}" =~ /[^\S \n]/ ?1:0;' 0 # Bug kept for backwards compatibility $ perl -E'say "\x{00A0}" =~ /[^\S \n]/ ?1:0;' 1 $ perl -le'print "\N{U+00A0}" =~ /[^\S \n]/ ?1:0;' 1 $ perl -le'print "\x{2660}\x{00A0}" =~ /[^\S \n]/ ?1:0;' 1
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: regexp: removing extra whitespace
by Eliya (Vicar) on Nov 04, 2011 at 21:28 UTC | |
by ikegami (Patriarch) on Nov 05, 2011 at 01:11 UTC | |
|
Re^4: regexp: removing extra whitespace
by perlmax (Initiate) on Nov 04, 2011 at 21:37 UTC | |
by GrandFather (Saint) on Nov 04, 2011 at 21:41 UTC | |
by ikegami (Patriarch) on Nov 05, 2011 at 01:23 UTC | |
by perlmax (Initiate) on Nov 04, 2011 at 23:24 UTC |