Re: parsing non english

in reply to parsing non english

Since you are mentioning a hex editor I guess you have problems with the charset - is that correct so far?

Have you found any editor that can open and display your files correctly? Where does the data come from?

In the meanwhile, read perluniintro.

Comment on Re: parsing non english

Replies are listed 'Best First'.
Re^2: parsing non english by arcnon (Monk) on Nov 08, 2007 at 17:30 UTC
It comes from a access database. Some Japanese fellows translated some information for a doctor but they placed all the translated names in 1 field... It has been placed upon me to break it up and insert it into a new database. Being I am a lazy american I can barely speak english. I didnt load any foriegn charsets so I assume I am not seeing a true representation. Honestly is this info unicode I dont have the slighest idea. just guessing the comma character based what I was told it was... then viewing that character in a hex editor.	[reply]
Re^3: parsing non english by moritz (Cardinal) on Nov 08, 2007 at 17:46 UTC
Well, first you have to find out the encoding. Otherwise the data is just binary garbage to your and your programs. I'd suggest to ask the ones that produced the data. There are a few other possiblities, for example the text editor vim has a decent charset autodetection. You can also try Encode::Guess, but you have to provide it with a list of possible encodings. Try to find out which encodings are used in japan on windows. Once you know the charset, you can decode with (with `decode` from the module Encode) and work with it.	[reply] [d/l]

In Section Seekers of Perl Wisdom