in reply to Re^3: function length() in UTF-8 context
in thread function length() in UTF-8 context

Because of the use of an editor in "UTF8" mode (namely Scite),
I got constant strings which I typed in the script with accented chars UTF8-encoded.
From what you said, it seems better to stay with the editor in iso8859-1 mode and encode the output if necessary ,
depending on the value of LANG variable, for example.
Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system?
  • Comment on Re^4: function length() in UTF-8 context

Replies are listed 'Best First'.
Re^5: function length() in UTF-8 context
by ikegami (Patriarch) on Nov 19, 2008 at 10:57 UTC

    I got constant strings which I typed in the script with accented chars UTF8-encoded.

    If your source code is UTF-8, use use utf8;.

    >perl -le"binmode STDOUT, ':encoding(iso-latin-1)'; print qq{print len +gth '\x85'}" | perl -l 1 Good >perl -le"binmode STDOUT, ':utf8'; print qq{print length '\x85'}" | pe +rl -l 2 BAD! >perl -le"binmode STDOUT, ':utf8'; print qq{use utf8; print length '\x +85'}" | perl -l 1 Good
Re^5: function length() in UTF-8 context
by ikegami (Patriarch) on Nov 19, 2008 at 11:10 UTC

    Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system?

    #!/usr/bin/perl # This source file is encoded using UTF-8. use utf8; use strict; use warnings; # Use locale-dependent encoding for STDIO. use open ':std', ':locale'; # Use locale-dependent encoding (by default) # for all files opened in this scope. # Unfortunately, <> ignores this directive. use open IO => ':locale'; ...

    It might makes more sense to use a known encoding for files, though.

    use open IO => ':encoding(UTF-8)';

    There's also File::BOM in case you want to accepts UTF-8 (and -16le and -16be) while giving allowing a fallback to another encoding such as iso-latin-1.