Re^2: parsing non english

It comes from a access database. Some Japanese fellows translated some information for a doctor but they placed all the translated names in 1 field... It has been placed upon me to break it up and insert it into a new database.
Being I am a lazy american I can barely speak english. I didnt load any foriegn charsets so I assume I am not seeing a true representation.
Honestly is this info unicode I dont have the slighest idea.
just guessing the comma character based what I was told it was... then viewing that character in a hex editor.

Comment on Re^2: parsing non english

Replies are listed 'Best First'.
Re^3: parsing non english by moritz (Cardinal) on Nov 08, 2007 at 17:46 UTC
Well, first you have to find out the encoding. Otherwise the data is just binary garbage to your and your programs. I'd suggest to ask the ones that produced the data. There are a few other possiblities, for example the text editor vim has a decent charset autodetection. You can also try Encode::Guess, but you have to provide it with a list of possible encodings. Try to find out which encodings are used in japan on windows. Once you know the charset, you can decode with (with `decode` from the module Encode) and work with it.	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: parsing non english
by moritz (Cardinal) on Nov 08, 2007 at 17:46 UTC

I'd suggest to ask the ones that produced the data.

There are a few other possiblities, for example the text editor vim has a decent charset autodetection.

You can also try Encode::Guess, but you have to provide it with a list of possible encodings. Try to find out which encodings are used in japan on windows.

Once you know the charset, you can decode with (with decode from the module Encode) and work with it.

[reply]
[d/l]