Re^3: Mess with UTF-8, utf8 and raw encoding on live working platform

There are two types of string variables in Perl. One type contains text, the other contains bytes.

Files contain bytes. If you want to write a text string to a file, the string needs to be converted from text to bytes. That's what the >:encoding(UTF-8) does.

OTOH if you want to read data from a file, without the :encoding(UTF-8) the string will contain bytes, and with it the string contains text.

Sad thing is, you can't reliably see from looking at a string if it's text or bytes. And if you mix the two up, you will see some broken output.

So if you use text strings internally in your program, you need the :encoding(UTF-8) both for reading and writing files, and you need to decode all other byte strings that come into your program (for example with %ENV or @ARGV).

OTOH some modules already decode strings for you (for example XML and JSON parsers), so you must be aware which module does that.

Perl 6 - second systems done right

Comment on Re^3: Mess with UTF-8, utf8 and raw encoding on live working platform Select or Download Code

Replies are listed 'Best First'.
Re^4: Mess with UTF-8, utf8 and raw encoding on live working platform by AlfaProject (Beadle) on Jun 05, 2011 at 07:15 UTC
I found in some Russian article to add this line `use encoding 'UTF-8'` in each file ... Now all works Thanks for all	[reply] [d/l]