in reply to Re^7: convert files to ansi (8859-1)
in thread convert files to ansi (8859-1)

Well, that explains alot ... so I need to look for another way to validate the encoding. Is there any known way to do this?

I read about Encode::Guess, maybe I have to take a look on it?

Replies are listed 'Best First'.
Re^9: convert files to ansi (8859-1)
by Corion (Patriarch) on Mar 29, 2017 at 08:46 UTC

    My approach to guessing the encoding would be to look for well-known phrases/trigrams. For example, if you know the language of the text, look for trigrams (or longer sequences) that indicate the encoding.

    "über" would be a good German word which commonly (enough) appears in the text and if you get

    "\xFCber" # ANSI / Latin-1 "\xC3\xBCber" # UTF-8
      We decided to develop a bytewise reader and converter. We need such an algorithm on multiple places anyway. Putting the effort in seems for us the most productive way now ...

      Thanks alot.