in reply to Why [[:alpha:]] doesn't involve diacritic characters in replace expression?

How about if we use the character class for this?

$str =~ s/[^[:alpha:]]//g; #i isn't needed
I think what you found is either a bug in perl, or a bug in the documentation. [[:^alpha:]] and [^[:alpha:]] should work the same by my reading of perlre. But, apparently, it isn't working the same.

Replies are listed 'Best First'.
Re^2: Why [[:alpha:]] doesn't involve diacritic characters in replace expression?
by graff (Chancellor) on Oct 25, 2008 at 17:32 UTC
    Thank you (++) for pointing that out, Tanktalus. Your post prompted me to do an experiment, comparing the POSIX and unicode-based "alpha" patterns (presented here as a sequence of bash shell commands).

    The [[:^blah:]] syntax is described in perlre as being "a Perl extension" to the POSIX syntax. My experiment shows that this extension, as applied to ":alpha:", creates a "special" class of characters, which match both ":alpha:" and ":^alpha:" -- these happen to be the "Latin1 upper-table" code points that involve letter symbols.

    (Using the "normal" method for inverting character classes --  [^[:alpha:]] -- has the expected behavior of providing the exact complement of [[:alpha:]].)

    Maybe this could be viewed as a "feature" of the ":^alpha:" syntax, but only if people know about it. Considering that it isn't explained as such (at all) in the perlre man page -- and since it clearly differs from [^[:alpha:]] -- I'd have to say it's more likely to be a bug. (My experiment used perl 5.8.8 built for darwin.)