It's probably best left optional since the entire
idea of having non-breaking spaces, is, not surprisingly,
to prevent the breakage of something. Although admittedly
overused on the Web at large, the principle is to provide
a visual space between two 'words' which aren't meant to
be separated. As such, something like 'Perl Monks'
should not be treated as two words, but rather, as a
single word. Including 0xA0 in \s would defeat the entire
purpose of having in the first place.
Instead, you could build methods into HTML::Message to
strip out these invisible buggers, which is really only
a single tr/\xA0/\x20/ operation anyway.
You might find that 0xA0 isn't the only "invisible"
character out there either, as it depends on the font that
you are using, and will likely vary from UNIX to Windows
to Macintosh. Sometimes if the font doesn't have a defined
character for that position, it draws nothing, a zero width
non-character that is there, but not. | [reply] |
That's not a very smart idea. Under a certain commercial operating system that shall remain nameless, 0xa0 maps to á.
I have some scripts that would be seriously bent by such a change in semantics of \s.
OTOH, it would be very nice to be able to define you own
idea of what \s (and cohorts) should represent... I can't count the number of times I match
[A-Za-z0-9] because I don't want the underscore. I know I
can match [^_\w] but people find that a little obfuscated around here.
(clarification: where here means
where I work, not the monastery).
<update date="2005-01-08"> Note that the [^_\W] trick does not work as expected with 5.8 when Unicode comes in to play...</update>
| [reply] [d/l] |