Re: inconsistency in whitespace handling

I'd like to point out that despite the character error in Skeeve's post ("\x2022" is a space (!) followed by "22", oops), his report is for real.

$latin = "\xa0";               # nbsp
$unicode= $latin . pack 'U0';  # convert to UTF-8

print "Latin 1: ", $latin =~  /\s/ ? "yes":"no", "\n";
print "Unicode: ", $unicode =~/\s/ ? "yes":"no", "\n";
[download]

result:

Latin 1: no
Unicode: yes

Comment on Re: inconsistency in whitespace handling Download Code

Replies are listed 'Best First'.
Re^2: inconsistency in whitespace handling by idsfa (Vicar) on May 12, 2005 at 14:40 UTC
True. That's because unicode changes the definition of whitespace. Until you go Unicode, perl defines whitespace as: `\s A whitespace character [ \t\n\r\f]` [download] But once you're in Unicode, it honors the encoding's WhiteSpace flag. (Which is set, in this case.) Updated: The same applies to thundergnat's discovery. The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. -- Cyrus H. Gordon	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: inconsistency in whitespace handling
by idsfa (Vicar) on May 12, 2005 at 14:40 UTC

True. That's because unicode changes the definition of whitespace. Until you go Unicode, perl defines whitespace as:

       \s      A whitespace character      [ \t\n\r\f]
[download]

But once you're in Unicode, it honors the encoding's WhiteSpace flag. (Which is set, in this case.)

Updated: The same applies to thundergnat's discovery.

The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. -- Cyrus H. Gordon

[reply]
[d/l]