Re: Character set conversions

Between the 5.8.1 Encode module and the various conversion tools available from other sources, I would actually tend to prefer the former, especially for Arabic. Last time I tried iconv to go from (e.g.) iso-8859-6 to utf8, it took the liberty of converting ascii digits to "Arabic script" digits, where this is neither justified nor desirable. (The original 5.8.0 encoding table did the same thing, but they fixed it in 5.8.1. For that matter, maybe iconv has been fixed since then as well -- I have actually seen a few different versions of iconv in operation, whose character-set inventories seemed oddly different.)

My main point is, be careful when doing character-encoding coversion on any non-European langauge; command-line utils (iconv, etc) may perform some replacements that are inappropriate, and yield more "?" (no-such-character) outputs than you would expect -- and sometimes this will be due to unexpected properties of the input data.

Encode.pm might do the same in some cases, when left to its default behavior, but at least you have the ability to change its behavior, and you can create and use alternate character mapping tables if necessary. (Check out "perldoc enc2xs".)

Comment on Re: Character set conversions