in reply to Re^2: UTF8 Validity
in thread UTF8 Validity

Encode::Guess is likely to be helpful for figuring out the source encodings for many of the Asian (multi-byte-char) strings, though it might not help much for distinguishing among single-byte encodings. Worth a try.

Replies are listed 'Best First'.
Re^4: UTF8 Validity
by Anonymous Monk on Feb 22, 2008 at 11:07 UTC

    Encode::Guess is lame because the user needs to tell it which encoding the binary is.

    Use Encode::Detect instead. This is the same detector used in Mozilla browsers.

      I've been using Encode::Guess, but have had trouble building a suspects list for some data. However, Firefox hasn't been able to appropriately handle the problem data, either, so if Encode::Detect is the same method, I doubt it would've done any better on this data.