Re^3: UTF8 Validity

Encode::Guess is likely to be helpful for figuring out the source encodings for many of the Asian (multi-byte-char) strings, though it might not help much for distinguishing among single-byte encodings. Worth a try.

Comment on Re^3: UTF8 Validity

Replies are listed 'Best First'.
Re^4: UTF8 Validity by Anonymous Monk on Feb 22, 2008 at 11:07 UTC
Encode::Guess is lame because the user needs to tell it which encoding the binary is. Use Encode::Detect instead. This is the same detector used in Mozilla browsers.	[reply]
Re^5: UTF8 Validity by menolly (Hermit) on Feb 22, 2008 at 18:23 UTC
I've been using Encode::Guess, but have had trouble building a suspects list for some data. However, Firefox hasn't been able to appropriately handle the problem data, either, so if Encode::Detect is the same method, I doubt it would've done any better on this data.	[reply]