in reply to Re^6: convert files to ansi (8859-1)
in thread convert files to ansi (8859-1)

Every file is valid ISO-8859-1, because ISO-8859-1 is a single-byte encoding.

Replies are listed 'Best First'.
Re^8: convert files to ansi (8859-1)
by Yaerox (Scribe) on Mar 29, 2017 at 08:42 UTC
    Well, that explains alot ... so I need to look for another way to validate the encoding. Is there any known way to do this?

    I read about Encode::Guess, maybe I have to take a look on it?

      My approach to guessing the encoding would be to look for well-known phrases/trigrams. For example, if you know the language of the text, look for trigrams (or longer sequences) that indicate the encoding.

      "über" would be a good German word which commonly (enough) appears in the text and if you get

      "\xFCber" # ANSI / Latin-1 "\xC3\xBCber" # UTF-8
        We decided to develop a bytewise reader and converter. We need such an algorithm on multiple places anyway. Putting the effort in seems for us the most productive way now ...

        Thanks alot.