The first thing to do is to figure out if you even need Unicode. Converting to supporting Unicode is a very expensive and time-consuming process. Often, it's easier to rewrite the app(s) from scratch than to convert an existing app.
Second, determine which kind of Unicode you want to support. Usually, this doesn't impact very much because most (if not all) open-source solutions support all the different character sets. But, some proprietary systems may support utf8 but not ucs2, etc. There are conversions between most character sets, but it's timeconsuming to set it up.
Next is to determine if your architecture currently supports Unicode. Depending on what your systems are doing, you might have to make sure your OS(es), database(s), and other 3rd-party applications support it. If they don't, you have to start there.
After all that's done, upgrade your Perl to 5.8.4 and handle that. Before 5.8.0, Unicode is supported through the use of modules, some of which aren't the easiest to use and which provide a rather large speed hit. With 5.8.0, Unicode is natively supported in Perl. This upgrade may or may not be expensive, depending on where you are and what you use. When upgrading from 5.a to 5.b, you generally will have to reinstall every module and run your entire regression suite.
Next, you'll need to make sure that the modules you use can handle Unicode. Most can, but there are some exceptions. (I can't think of any offhand, but I remember reading about some. SuperSearch a bit.)
At this point, you're now ready to determine if your source code and proprietary protocols will handle Unicode. Good luck!
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] |
Thank you!
I am already at 5.8.0, and I think I'll stay there for a while longer.
I'd like to take a small app, one that has it's own data source that is distinct from others, so it's already quite isolated.
I need to dig in and understand utf8 and ucs2 and what the characters sets are.
I need to start somewhere though, and I was thinking my app would be a good place, but I have to go a lot further than grok utf8 I/O.
| [reply] |
Markus Kuhn maintains an excellent utf8 FAQ. There are a lot of intros out there, but this one is my favorite.
| [reply] |
| [reply] |
It doesn't scare me away, but it does scare me.
Dan's blog is very well written, and conveys a lot of information with humor thrown in. Is Dan on PM? I wonder if he'd approve us lifting that article and using it as part of our Unicode tutorial set that desperately needs to be written?
| [reply] |
| [reply] |