in reply to Spaced Out

Very enlightening. I wonder though, perhaps future iterations of Perl ought to recognize xA0 as a member of \s.

Replies are listed 'Best First'.
Re^2: Spaced Out
by tadman (Prior) on Feb 24, 2001 at 04:13 UTC
    It's probably best left optional since the entire idea of having non-breaking spaces, is, not surprisingly, to prevent the breakage of something. Although admittedly overused on the Web at large, the principle is to provide a visual space between two 'words' which aren't meant to be separated. As such, something like 'Perl Monks' should not be treated as two words, but rather, as a single word. Including 0xA0 in \s would defeat the entire purpose of having   in the first place.

    Instead, you could build methods into HTML::Message to strip out these invisible buggers, which is really only a single tr/\xA0/\x20/ operation anyway.

    You might find that 0xA0 isn't the only "invisible" character out there either, as it depends on the font that you are using, and will likely vary from UNIX to Windows to Macintosh. Sometimes if the font doesn't have a defined character for that position, it draws nothing, a zero width non-character that is there, but not.
don't change \s semantics
by grinder (Bishop) on Feb 23, 2001 at 15:29 UTC

    That's not a very smart idea. Under a certain commercial operating system that shall remain nameless, 0xa0 maps to á.

    I have some scripts that would be seriously bent by such a change in semantics of \s.

    OTOH, it would be very nice to be able to define you own idea of what \s (and cohorts) should represent... I can't count the number of times I match [A-Za-z0-9] because I don't want the underscore. I know I can match [^_\w] but people find that a little obfuscated around here. (clarification: where here means where I work, not the monastery).

    <update date="2005-01-08"> Note that the [^_\W] trick does not work as expected with 5.8 when Unicode comes in to play...</update>