Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I want to replace all non-alphabets and symbols present in a string with spaces. How can i do that?

Replies are listed 'Best First'.
Re: Replacing symbols in a string
by BrowserUk (Patriarch) on Jul 16, 2011 at 17:47 UTC

    $s = join '', map chr, 0 .. 255;; print $s;; &#9786;&#9787;&#9829;&#9830; &#9835;&#9788;&#9658;&#9668;&#8597;&#8252;¶§&#9644;&#8616;&#8593;&#859 +5;&#8594;&#8592;&#8735;&#8596;&#9650;&#9660; !"#$%&'()*+,-./012345678 +9:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|} +~&#8962;ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜø£Ø×ƒáíóúñѪº¿®¬½¼¡«»&#9617;&#9618 +;&#9619;&#9474;&#9508;ÁÂÀ©&#9571;&#9553;&#9559;&#9565;¢¥&#9488;&#9492 +;&#9524;&#9516;&#9500;&#9472;&#9532;ãÃ&#9562;&#9556;&#9577;&#9574;&#9 +568;&#9552;&#9580;¤ðÐÊËÈ&#305;ÍÎÏ&#9496;&#9484;&#9608;&#9604;¦Ì&#9600 +;ÓßÔÒõÕµþÞÚÛÙýݯ´­±&#8215;¾¶§÷¸°¨·¹³²&#9632;  $s =~ tr[A-Za-z][ ]c;; print $s;; ABCDE +FGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Replacing symbols in a string
by Corion (Patriarch) on Jul 16, 2011 at 17:29 UTC

    Depending on the encoding your string is in, the easiest is to use either \W (for ASCII) or [[:^alpha:]] (for Unicode?). See perlop on the s/// operator and perlre on the character escapes used.

    s/\W/ /g;

    Update: Fixed incorrect usage of POSIX character class, as spotted by AnomalousMonk. It's [[:alpha:]], and not [:alpha:] (this error is also mentioned in perlre, and will produce a warning, while matching :, a, l, p or h.

      \W (for ASCII)
      Eh, no, well, maybe, perhaps, who can tell?

      \W will be Unicode semantics if the string is internally in UTF-8 format, if the pattern contains non-Latin-1 characters (well, sometimes, not always), or, in 5.14, if you use the /u modifier. Otherwise, if locale is in effect, or, in 5.14, if you use the /l modifier, it will be using the semantics of whatever locale is in effect. The /l modifier will overrule the heuristics that would otherwise trigger Unicode semantics. And otherwise, or if either the /a or the /aa modifier is in effect, ASCII semantics will be used.

      Oh, and then there's use feature 'unicode_strings'; that may trigger the first case when you don't expect it. use 5.012; will enable it.

      [^:alpha:] (for Unicode?).
      [:alpha:] is a POSIX class. It should never match any code point larger than 255 - but Perl has f*cked this one up. I think this is fixed by using one of the new 5.14 regexp modifiers, but I'm not quite sure.