dragonchild has asked for the wisdom of the Perl Monks concerning the following question:

Before I begin, I'm using Perl5.005_3 (and cannot upgrade), so I can't use Encode. That would be my preferred solution, but it's not allowed by my environment.

I want to be able to use XML::Parser, but the strings in many of my templates come from a database that has them encoded. Now, all the European encodings are handled just fine. But, the Chinese (Big5), Japanese (Shift-JIS), and Korean (Euc-KR) encodings are not.

Does anyone know of any encoding files for XML::Parser (or any other XML parser) to handle these legacy encodings? Or, barring that, a way of converting them to UCS-2 or UTF8?

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

  • Comment on Conversion of language encodings to UTF8

Replies are listed 'Best First'.
Re: Conversion of language encodings to UTF8
by goldbug (Initiate) on Feb 26, 2003 at 16:58 UTC
    Depending on the internal encoding of the database or the kind of database it may just be possible to set the encoding through the datbase client. This is what I did with IBM DB2 rather than have to convert encoding in my script. There may be something comparable in the DB you are using. This way I didn't have to worry about which encoding the DB is sending to me. Just add something like this before making the DB connection and issuing SQL.

    system "db2set db2codepage=1208";
Re: Conversion of language encodings to UTF8
by webfiend (Vicar) on Feb 26, 2003 at 19:39 UTC

    Have you had a chance to look at Unicode::MapUTF8? It fulfills pretty much the role you need, and last I knew was still useful under 5.005.


    I just realized that I was using the same sig for nearly three years.

Re: Conversion of language encodings to UTF8
by allolex (Curate) on Feb 26, 2003 at 18:44 UTC

    Might you be able to use the tools recode or iconv? Recode is available on GNU/Linux-UNIX and on Windows. I don't know about iconv's availabilty on anything other than GNU/Linux, but it might have been ported to Windows by now.

    You could use these to prepare your files for actual processing with Perl.

    Update: PodMaster has just pointed out to me that there is a CPAN module interface for iconv available. Check it out.

    --
    Allolex

Re: Conversion of language encodings to UTF8
by grantm (Parson) on Feb 26, 2003 at 18:40 UTC

    You might want to search the archives of the perl-xml mailing list - I seem to recall the question has been asked there. Failing that, post your question on the list.