in reply to Re^2: function length() in UTF-8 context
in thread function length() in UTF-8 context

so, it seems that we have to "come back" to iso8859-1 previously to use safely "string functions"

I don't know what you mean by this.

You decode on input, and encode on output. Leave it decoded for the duration of your program. Unless you're working with binary file formats, you shouldn't have to call encode or decode. Just using an appropriate :encoding when opening files should take care of text files.

Replies are listed 'Best First'.
Re^4: function length() in UTF-8 context
by didess (Sexton) on Nov 19, 2008 at 10:47 UTC
    Because of the use of an editor in "UTF8" mode (namely Scite),
    I got constant strings which I typed in the script with accented chars UTF8-encoded.
    From what you said, it seems better to stay with the editor in iso8859-1 mode and encode the output if necessary ,
    depending on the value of LANG variable, for example.
    Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system?

      I got constant strings which I typed in the script with accented chars UTF8-encoded.

      If your source code is UTF-8, use use utf8;.

      >perl -le"binmode STDOUT, ':encoding(iso-latin-1)'; print qq{print len +gth '\x85'}" | perl -l 1 Good >perl -le"binmode STDOUT, ':utf8'; print qq{print length '\x85'}" | pe +rl -l 2 BAD! >perl -le"binmode STDOUT, ':utf8'; print qq{use utf8; print length '\x +85'}" | perl -l 1 Good

      Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system?

      #!/usr/bin/perl # This source file is encoded using UTF-8. use utf8; use strict; use warnings; # Use locale-dependent encoding for STDIO. use open ':std', ':locale'; # Use locale-dependent encoding (by default) # for all files opened in this scope. # Unfortunately, <> ignores this directive. use open IO => ':locale'; ...

      It might makes more sense to use a known encoding for files, though.

      use open IO => ':encoding(UTF-8)';

      There's also File::BOM in case you want to accepts UTF-8 (and -16le and -16be) while giving allowing a fallback to another encoding such as iso-latin-1.