in reply to Regexp and Linux (is it utf issue?)
When you read text files, you should decode them. This is easy using PerlIO layers, Encode module and three-argument form of open:
This way, Perl decodes everything automatically, and you only have to work with characters, not bytes.use Encode; open my $fh, "<:encoding(whatever)", $filename or die $!;
When you write text to files, writing characters produces the famous warning: "wide character in (sub name)...". You need to encode them using the same technique: open my $write, ">:encoding(whatever)", $filename or die $!;. You can use :utf8 layer to encode characters because they are internally stored as valid UTF-8.
Do not use :utf8 iolayer to decode text because it simply sets "character" flag on the strings read from filehandles without any checks and this is generally unsafe: UTF8 related proof of concept exploit released at T-DOSE.
|
|---|