Re^3: function length() in UTF-8 context

so, it seems that we have to "come back" to iso8859-1 previously to use safely "string functions"

I don't know what you mean by this.

You decode on input, and encode on output. Leave it decoded for the duration of your program. Unless you're working with binary file formats, you shouldn't have to call encode or decode. Just using an appropriate :encoding when opening files should take care of text files.

Comment on Re^3: function length() in UTF-8 context Select or Download Code

Replies are listed 'Best First'.
Re^4: function length() in UTF-8 context by didess (Sexton) on Nov 19, 2008 at 10:47 UTC
Because of the use of an editor in "UTF8" mode (namely Scite), I got constant strings which I typed in the script with accented chars UTF8-encoded. From what you said, it seems better to stay with the editor in iso8859-1 mode and encode the output if necessary , depending on the value of LANG variable, for example. Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system?	[reply]
Re^5: function length() in UTF-8 context by ikegami (Patriarch) on Nov 19, 2008 at 10:57 UTC
I got constant strings which I typed in the script with accented chars UTF8-encoded. If your source code is UTF-8, use `use utf8;`. `>perl -le"binmode STDOUT, ':encoding(iso-latin-1)'; print qq{print len +gth '\x85'}" \| perl -l 1 Good >perl -le"binmode STDOUT, ':utf8'; print qq{print length '\x85'}" \| pe +rl -l 2 BAD! >perl -le"binmode STDOUT, ':utf8'; print qq{use utf8; print length '\x +85'}" \| perl -l 1 Good` [download]	[reply] [d/l] [select]
Re^5: function length() in UTF-8 context by ikegami (Patriarch) on Nov 19, 2008 at 11:10 UTC
Is it possible to "switch" a script in such a way that ALL outputs get encoded with respect to some locale setting we can read from the system? `#!/usr/bin/perl # This source file is encoded using UTF-8. use utf8; use strict; use warnings; # Use locale-dependent encoding for STDIO. use open ':std', ':locale'; # Use locale-dependent encoding (by default) # for all files opened in this scope. # Unfortunately, <> ignores this directive. use open IO => ':locale'; ...` [download] It might makes more sense to use a known encoding for files, though. `use open IO => ':encoding(UTF-8)';` [download] There's also File::BOM in case you want to accepts UTF-8 (and -16le and -16be) while giving allowing a fallback to another encoding such as iso-latin-1.	[reply] [d/l] [select]