Re^4: function length() in UTF-8 context

Because of the use of an editor in "UTF8" mode (namely Scite),
I got constant strings which I typed in the script with accented chars UTF8-encoded.
From what you said, it seems better to stay with the editor in iso8859-1 mode and encode the output if necessary ,
depending on the value of LANG variable, for example.
Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system?

Comment on Re^4: function length() in UTF-8 context

Replies are listed 'Best First'.
Re^5: function length() in UTF-8 context by ikegami (Patriarch) on Nov 19, 2008 at 10:57 UTC
I got constant strings which I typed in the script with accented chars UTF8-encoded. If your source code is UTF-8, use `use utf8;`. `>perl -le"binmode STDOUT, ':encoding(iso-latin-1)'; print qq{print len +gth '\x85'}" \| perl -l 1 Good >perl -le"binmode STDOUT, ':utf8'; print qq{print length '\x85'}" \| pe +rl -l 2 BAD! >perl -le"binmode STDOUT, ':utf8'; print qq{use utf8; print length '\x +85'}" \| perl -l 1 Good` [download]	[reply] [d/l] [select]
Re^5: function length() in UTF-8 context by ikegami (Patriarch) on Nov 19, 2008 at 11:10 UTC
Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system? `#!/usr/bin/perl # This source file is encoded using UTF-8. use utf8; use strict; use warnings; # Use locale-dependent encoding for STDIO. use open ':std', ':locale'; # Use locale-dependent encoding (by default) # for all files opened in this scope. # Unfortunately, <> ignores this directive. use open IO => ':locale'; ...` [download] It might makes more sense to use a known encoding for files, though. `use open IO => ':encoding(UTF-8)';` [download] There's also File::BOM in case you want to accepts UTF-8 (and -16le and -16be) while giving allowing a fallback to another encoding such as iso-latin-1.	[reply] [d/l] [select]