in reply to Unicode & charset conversions - how?

Where I work, we needed to convert a long list of character sets all mixed together using ISO2022 escape codes, with a few custom quirks. It was easy to write our own transcoder that dealt with errors properly. The character sets themselves are all available in tables online. Scanned documents are filed as part of the International Registry of Character Sets, and the Unicode databases give ready-made machine readable conversion tables. I know big5 is one of the ones cross-referenced in the Unicode book.

So my advice is get the tables for the charsets you need. They must exist somewhere, even if you need two "hops" to do it.

—John

  • Comment on Re: Unicode & charset conversions - how?

Replies are listed 'Best First'.
Re: Re: Unicode & charset conversions - how?
by roundboy (Sexton) on Dec 27, 2002 at 01:28 UTC

    Thanks for the replies. In fact I looked at my data more carefully, and discovered that out of 600-odd documents, there were only 9 that iconv was unhappy with. By random coincidence my initial testing was using 3 of those 9, so the problem looked much more severe than it was!

    So the solution I arrived at was to simply recreate the 9 docs and eliminate the ostensibly bogus data.