The correct way of handling encodings in Perl is not caring about. If you're caring too much, you're doing the wrong way...
I wish I could agree with this statement... but I'm afraid I can't.
During the last few months at work, I've been involved in a number of Perl projects in Japanese and Chinese environments, where correct handling of encodings is of paramount importance (in particular on Windows, with its unholy mixture of encodings, like UCS-2, UTF-8 and various legacy codepages.) During that time, I've run into several encoding issues, where you just have to "care too much" (to use your words), or else things simply won't work.
For one, Perl doesn't (yet) provide any convenient abstraction layer for handling file names (as opposed to file contents), which means you have to take care of everything yourself manually (by writing wrapper functions, using Encode::(en|de)code explicitly, etc.). In case you're interested in the details, look here for the kind of things I'm having in mind.
This isn't the only problem, though. There are a few "borderline" bugs, like the one I posted recently, in the hope to get some feedback on whether other people would also consider this a bug. (Didn't work out, btw. Not a single reply -- which makes me conclude that, with respect to unicode issues, there's not exactly an overwhelming amount of interest in the Perl community. Kind of a pity, but such is life.). Anyway, what I mean to say is that, having to figure out that you need to specify :raw:encoding(ucs-2le):crlf:utf8 to read/write ordinary UCS-2 files (as frequently encountered on Windows platforms) is just a bit "having to care too much" for my taste... Not to forget the bug revealed in this thread, and other oddities related to subtle differences between use utf8 and use encoding 'utf8', for example.
Of course, whether something is a bug, always is kind of subjective, as it largely depends on your expectations of how things should work, but I think we're not doing ourselves a favor to pretend that everything encoding-related in Perl is working without hassles...
Sorry for the rant, and don't get me wrong. I'm a big fan of Perl, and I would surely advocate Perl wherever appropriate. However, in one of the projects mentioned above, I've had a rather hard time convincing my clients to stick with Perl, and not switch to some other language altogether. This involved investing quite a few unpaid hours on my side (spent on debugging and working around various peculiarities) to keep the price competitive.
Hope you can forgive the somewhat emotional tone of this post. In any case it's not meant to attack you personally, ruoso. Just needed to vent a little... and I'm feeling better now :)
In reply to Re^2: utf8, locale and regexp
by almut
in thread utf8, locale and regexp
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |