Samn has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Non alphabet characters
by rob_au (Abbot) on Mar 28, 2002 at 05:32 UTC
    I think what you are looking for is [^\w] - It may be worth your having a read through perlre, from whence I quote ...

    A \w matches a single alphanumeric character, not a whole word. To match a word you'd need to say \w+. If use locale is in effect, the list of alphabetic characters generated by \w is taken from the current locale. See the perllocale manpage. You may use \w, \W, \s, \S, \d, and \D within character classes (though not as either end of a range).

    Update - Too little caffiene ... [^\w] is equivalent to \W as Chmrr++ rightly points out below.

     

      [^\w] is more canonically written as \W. Note that that is a capital W. Also perhaps of note is that \w is equivilent to [a-zA-Z0-9_] -- that is, alphanumeric or the underscore; this means that [^\w] or \W won't catch numbers, which the poster specifically included in their post. Thus, you'd be better off using [^a-zA-Z] As always, it's easier to specify what you don't want in a large set, then what you do want.

      perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

        Also perhaps of note is that \w is equivilent to [a-zA-Z0-9_]

        It is not true. \w is subject of locale settings if locale pragma is in use. See perllocale for more info. It is not always equivilent to [a-zA-Z0-9_].

        --
        Ilya Martynov (http://martynov.org/)

Re: Non alphabet characters
by derby (Abbot) on Mar 28, 2002 at 12:12 UTC
    Samn,

    Just remember \W will not include _ (the underscore) it's part of \w world. To get the "real" alphabet and be nice to locales, try using the posix class [:alpha:] like so:

    #!/usr/bin/perl $alpha = 'aeiou_XZY_^#@_123_BCD'; ($one = $alpha) =~ s/\W//g; # _ stays ($two = $alpha) =~ s/[^[:alpha:]]//g; # _ gone print $one, "\n"; print $two, "\n";

    -derby