Replacing symbols in a string

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Replacing symbols in a string by BrowserUk (Patriarch) on Jul 16, 2011 at 17:47 UTC
$s = join '', map chr, 0 .. 255;; print $s;; ☺☻♥♦ ♫☼►◄↕‼¶§▬↨↑&#859 +5;→←∟↔▲▼ !"#$%&'()*+,-./012345678 +9:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{\|} +~⌂ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜø£Ø×ƒáíóúñÑªº¿®¬½¼¡«»░&#9618 +;▓│┤ÁÂÀ©╣║╗╝¢¥┐&#9492 +;┴┬├─┼ãÃ╚╔╩╦&#9 +568;═╬¤ðÐÊËÈıÍÎÏ┘┌█▄¦Ì&#9600 +;ÓßÔÒõÕµþÞÚÛÙýÝ¯´±‗¾¶§÷¸°¨·¹³²■ $s =~ tr[A-Za-z][ ]c;; print $s;; ABCDE +FGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Replacing symbols in a string by Corion (Patriarch) on Jul 16, 2011 at 17:29 UTC
Depending on the encoding your string is in, the easiest is to use either `\W` (for ASCII) or `[[:^alpha:]]` (for Unicode?). See perlop on the `s///` operator and perlre on the character escapes used. `s/\W/ /g;` [download] Update: Fixed incorrect usage of POSIX character class, as spotted by AnomalousMonk. It's `[[:alpha:]]`, and not `[:alpha:]` (this error is also mentioned in perlre, and will produce a warning, while matching `:`, `a`, `l`, `p` or `h`.	[reply] [d/l] [select]
Re^2: Replacing symbols in a string by AnomalousMonk (Archbishop) on Jul 16, 2011 at 17:45 UTC
See also the POSIX Character Classes section in perlrecharclass.	[reply]
Re^2: Replacing symbols in a string by JavaFan (Canon) on Jul 16, 2011 at 17:59 UTC
\W (for ASCII) Eh, no, well, maybe, perhaps, who can tell? `\W` will be Unicode semantics if the string is internally in UTF-8 format, if the pattern contains non-Latin-1 characters (well, sometimes, not always), or, in 5.14, if you use the `/u` modifier. Otherwise, if locale is in effect, or, in 5.14, if you use the `/l` modifier, it will be using the semantics of whatever locale is in effect. The `/l` modifier will overrule the heuristics that would otherwise trigger Unicode semantics. And otherwise, or if either the `/a` or the `/aa` modifier is in effect, ASCII semantics will be used. Oh, and then there's `use feature 'unicode_strings';` that may trigger the first case when you don't expect it. `use 5.012;` will enable it. `[^:alpha:]` (for Unicode?). `[:alpha:]` is a POSIX class. It should never match any code point larger than 255 - but Perl has f*cked this one up. I think this is fixed by using one of the new 5.14 regexp modifiers, but I'm not quite sure.	[reply] [d/l] [select]