in reply to Re^8: convert files to ansi (8859-1)
in thread convert files to ansi (8859-1)

My approach to guessing the encoding would be to look for well-known phrases/trigrams. For example, if you know the language of the text, look for trigrams (or longer sequences) that indicate the encoding.

"über" would be a good German word which commonly (enough) appears in the text and if you get

"\xFCber" # ANSI / Latin-1 "\xC3\xBCber" # UTF-8

Replies are listed 'Best First'.
Re^10: convert files to ansi (8859-1)
by Yaerox (Scribe) on Mar 29, 2017 at 09:37 UTC
    We decided to develop a bytewise reader and converter. We need such an algorithm on multiple places anyway. Putting the effort in seems for us the most productive way now ...

    Thanks alot.