in reply to parsing non english

Since you are mentioning a hex editor I guess you have problems with the charset - is that correct so far?

Have you found any editor that can open and display your files correctly? Where does the data come from?

In the meanwhile, read perluniintro.

Replies are listed 'Best First'.
Re^2: parsing non english
by arcnon (Monk) on Nov 08, 2007 at 17:30 UTC
    It comes from a access database. Some Japanese fellows translated some information for a doctor but they placed all the translated names in 1 field... It has been placed upon me to break it up and insert it into a new database.
    Being I am a lazy american I can barely speak english. I didnt load any foriegn charsets so I assume I am not seeing a true representation.
    Honestly is this info unicode I dont have the slighest idea.
    just guessing the comma character based what I was told it was... then viewing that character in a hex editor.
      Well, first you have to find out the encoding. Otherwise the data is just binary garbage to your and your programs.

      I'd suggest to ask the ones that produced the data.

      There are a few other possiblities, for example the text editor vim has a decent charset autodetection.

      You can also try Encode::Guess, but you have to provide it with a list of possible encodings. Try to find out which encodings are used in japan on windows.

      Once you know the charset, you can decode with (with decode from the module Encode) and work with it.