encoding from Unicode to GB

dash2 has asked for the wisdom of the Perl Monks concerning the following question:

Okay, this has baffled me. I am trying to write a script to search for multilingual domain names, and this requires that I read Chinese characters input into a browser. Micro$oft tends to encode these using &#NUMBER where NUMBER. is the unicode number of the chinese character. I know how to convert this from Unicode to GB encoding (it involves a very large hash) but I don't know how to encode the resulting number into a GB string. E.g., if I have some hex GB numbers (0x77FE 0x44DF etc.) how do I convert these into a GB string (using GB2312-80) to send over the internet ?
foolish newbie earnestly requests help.
Dave

btw: apologies for the ugly formatting. I am viewing in Konqueror which for some reason prints the TEXTAREA as a very narrow strip.

Comment on encoding from Unicode to GB

Replies are listed 'Best First'.
Re: encoding from Unicode to GB by clemburg (Curate) on Nov 28, 2000 at 18:06 UTC
You might be interested in looking at: http://www.mandarintools.com, it has an entry named Chinese Perl Library, and a Perl based Chinese Encoding Guesser. The Perl, Unicode and I18N FAQ. Disclaimer: I have no practical experience in doing these conversions, I just searched for tools. Christian Lemburg Brainbench MVP for Perl http://www.brainbench.com	[reply]
Re: encoding from Unicode to GB by lhoward (Vicar) on Nov 28, 2000 at 18:20 UTC
If you set up your form properly: `<form accept-charset="utf-8">` [download] the chractes you submit will be sent to the web server in unicode, not in the alternate encoding scheme you describe. It will also help if you put the document itself into UTF-8 as-well by setting the content-type (either in the HTTP header or in a meta-tag) to charset=utf-8. update Here are the links I promised: i18n: HTML Character set issues beyond HTML3.2 (be sure to see his FORM page) HTML Unleashed: Internationalizing HTML this is the whole chapter from teh book online, not a link to buy the book. World-Wide Character Sets, Languages, and Writing Systems from w3c Unicode.org homepage of the unicode consotrtium In addition to the unicode conversion modules mentioned above (Unicode::Map, Unicode::Map8, cpan::Unicode::MapUTF8) you may also want to check out the Text::Iconv module which also provides character set conversion functionality.	[reply] [d/l]
Re: Re: encoding from Unicode to GB by dash2 (Hermit) on Nov 28, 2000 at 18:51 UTC
Thanks for the useful hints. Some references would be great. It would also be nice if someone could explain to me how one goes about encoding strings as bytes for sending over the net - a high-level description would be good, because I really feel rather blind in this area.	[reply]
Re: encoding from Unicode to GB by snax (Hermit) on Nov 28, 2000 at 17:58 UTC
Once you're in unicode you can use the Unicode modules. They're very nice. Check CPAN for Unicode.	[reply]