in reply to Re^5: problem with hashes, loaded from file
in thread problem with hashes, loaded from file

I only a few years ago started writing programs with Russian text IO in mind. Before that, I've just been saving perl program's files in utf8, thinking, that it makes my programs unicode enabled, as read in different places about it. I've had to start including those unicode and locale settings, after I've noticed that Russian text is scrambled on the output, but I am pretty vague about utf8 in perl, because, there is so much I have and want to learn in perl other, then unicode, but, when I look in unicode/utf8 family of perl documentation, there is so much to read (which I've done more deeply in the last few months, but in comparison of how much is left to read on that subject, it's almost nothing), that I always prefer reading on other subjects in perl, than unicode. I read more about unicode, when encounter some problem. I have a perl version 5.14 installed on my comp. I've recently installed new servers, ran my programs on them and noticed that there are warnings about deprecated encoding. The servers have perl 5.18 installed. I was thinking about updating my code, since those unicode settings were added to all of my libraries. Didn't do it yet, because of other deadlines that I have to meet. I will try your suggestion and see what it will do to me.
  • Comment on Re^6: problem with hashes, loaded from file

Replies are listed 'Best First'.
Re^7: problem with hashes, loaded from file
by Anonymous Monk on Dec 25, 2014 at 21:10 UTC
    Yes, it is very unfortunate. Perl's Unicode capabilities are some of the best among all programming languages (I think only ICU is comparable?), but its string handling is very confusing, and the documentation is huge and all over the place. There were discussions about that already... Basically, the best way is to decode all input and encode all output. The main tools are: use utf8, use open... (the pragma), open the function, binmode and Encode. Of course, if some filenames are not valid utf8, they can't be decoded as such.