in reply to Regular Expressions on Unicode
A. This does not work:
$line doesn't contain MODIFIER LETTER GLOTTAL STOP, it contains some encoding of it. You need to decode the input or tell Perl to do it for you.
You can tell Perl to handle the decoding using the :encoding PerlIO layer. It can be added to handles using binmode or use open.
There's a catch.
<> is short for <ARGV>. ARGV is special handle, and unfortunately, adding PerlIO layers to it doesn't work well.
It might be simplest to handle the decoding yourself. Say the input is encoded using UTF-8, all you need is
while (my $line = <>) { utf8::decode( $line ); ... }
For other encodings, use Encode's decode.
|
|---|