Re: detecting non-ascii chars in a string

The answer depends on your version of perl. This becomes much more powerful, if more difficult, witn the unicode support available in recent Perl.

For perl without unicode, you only need to define a character class for the 8-bit numbers of interest: my $NotASCII = '[\x80-\xff]'; catches them all. Don't put that between \Q\E in a regex, you want it to be interpolated with the metacharacters unescaped. It is easy to get exactly what you want with a unicode perl:

use utf8;
print 'Hiragana or Katakana detected!', $/
    if /[\p{InHiragana}\p{InKatakana}]/;
[download]

You will get odd results from one perl to the next by counting characters. For example, the length builtin uses character semantics, so the results differ depending on whether utf8 is in force or not. The use byte(); pragmatic will allow you to count bytes with byte::length.

After Compline,
Zaxo

Comment on Re: detecting non-ascii chars in a string Select or Download Code