in reply to What's the best way to detect character encodings, Windows-1252 v. UTF-8?
You might want to look at Encoding-FixLatin - I created it for a very similar situation. In my case I had a Postgres database from an application that had treated text as 8-bit binary strings. Each record was one of: ASCII, UTF-8, ISO-8859-1 or CP1252, but the DB dump as a whole was a mixture of all these. The documentation for Encoding::FixLatin describes the heuristics it uses.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: What's the best way to detect character encodings, Windows-1252 v. UTF-8?
by Khen1950fx (Canon) on Jun 18, 2011 at 11:37 UTC |