in reply to Replacing symbols in a string

Depending on the encoding your string is in, the easiest is to use either \W (for ASCII) or [[:^alpha:]] (for Unicode?). See perlop on the s/// operator and perlre on the character escapes used.

s/\W/ /g;

Update: Fixed incorrect usage of POSIX character class, as spotted by AnomalousMonk. It's [[:alpha:]], and not [:alpha:] (this error is also mentioned in perlre, and will produce a warning, while matching :, a, l, p or h.

Replies are listed 'Best First'.
Re^2: Replacing symbols in a string
by JavaFan (Canon) on Jul 16, 2011 at 17:59 UTC
    \W (for ASCII)
    Eh, no, well, maybe, perhaps, who can tell?

    \W will be Unicode semantics if the string is internally in UTF-8 format, if the pattern contains non-Latin-1 characters (well, sometimes, not always), or, in 5.14, if you use the /u modifier. Otherwise, if locale is in effect, or, in 5.14, if you use the /l modifier, it will be using the semantics of whatever locale is in effect. The /l modifier will overrule the heuristics that would otherwise trigger Unicode semantics. And otherwise, or if either the /a or the /aa modifier is in effect, ASCII semantics will be used.

    Oh, and then there's use feature 'unicode_strings'; that may trigger the first case when you don't expect it. use 5.012; will enable it.

    [^:alpha:] (for Unicode?).
    [:alpha:] is a POSIX class. It should never match any code point larger than 255 - but Perl has f*cked this one up. I think this is fixed by using one of the new 5.14 regexp modifiers, but I'm not quite sure.
Re^2: Replacing symbols in a string
by AnomalousMonk (Archbishop) on Jul 16, 2011 at 17:45 UTC