I am in an environment where there is no use of Unicode at all. Not in the source, not in the applications, and not on the websites. I dare say that folks aren't really aware of the implications of supporting it, the benefits of using it, or the effort it might take. But this will change.

I am wondering how such a change might be brought about? Begin with modifying applications to accept utf8, then a pilot project where some of the data is changed? Or perhaps just start using IO layers everywhere first? I presume an incremental approach is possible, because that would have to be the way it is handled here. My Perl applications are ideally placed to attempt this.

Anyone done this, and have any advice to share?

Replies are listed 'Best First'.
Re: Introduction of Unicode
by dragonchild (Archbishop) on Jun 30, 2004 at 13:30 UTC
    The first thing to do is to figure out if you even need Unicode. Converting to supporting Unicode is a very expensive and time-consuming process. Often, it's easier to rewrite the app(s) from scratch than to convert an existing app.

    Second, determine which kind of Unicode you want to support. Usually, this doesn't impact very much because most (if not all) open-source solutions support all the different character sets. But, some proprietary systems may support utf8 but not ucs2, etc. There are conversions between most character sets, but it's timeconsuming to set it up.

    Next is to determine if your architecture currently supports Unicode. Depending on what your systems are doing, you might have to make sure your OS(es), database(s), and other 3rd-party applications support it. If they don't, you have to start there.

    After all that's done, upgrade your Perl to 5.8.4 and handle that. Before 5.8.0, Unicode is supported through the use of modules, some of which aren't the easiest to use and which provide a rather large speed hit. With 5.8.0, Unicode is natively supported in Perl. This upgrade may or may not be expensive, depending on where you are and what you use. When upgrading from 5.a to 5.b, you generally will have to reinstall every module and run your entire regression suite.

    Next, you'll need to make sure that the modules you use can handle Unicode. Most can, but there are some exceptions. (I can't think of any offhand, but I remember reading about some. SuperSearch a bit.)

    At this point, you're now ready to determine if your source code and proprietary protocols will handle Unicode. Good luck!

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Thank you!

      I am already at 5.8.0, and I think I'll stay there for a while longer. I'd like to take a small app, one that has it's own data source that is distinct from others, so it's already quite isolated. I need to dig in and understand utf8 and ucs2 and what the characters sets are.

      I need to start somewhere though, and I was thinking my app would be a good place, but I have to go a lot further than grok utf8 I/O.

        Markus Kuhn maintains an excellent utf8 FAQ. There are a lot of intros out there, but this one is my favorite.
Re: Introduction of Unicode
by hardburn (Abbot) on Jun 30, 2004 at 13:58 UTC

      It doesn't scare me away, but it does scare me.

      Dan's blog is very well written, and conveys a lot of information with humor thrown in. Is Dan on PM? I wonder if he'd approve us lifting that article and using it as part of our Unicode tutorial set that desperately needs to be written?

        IIRC, Dan goes by Elian around here.

        ----
        send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.