in reply to Re^3: Encoding changed from Greek to somethign else
in thread Encoding changed from Greek to somethign else

This node falls below the community's minimum standard of quality and will not be displayed.
  • Comment on Re^4: Encoding changed from Greek to somethign else

Replies are listed 'Best First'.
Re^5: Encoding changed from Greek to somethign else
by daxim (Curate) on Jul 20, 2007 at 18:14 UTC
    Encoding(conversion from bytes to chars)

    Nah, that's decoding. :)

    Anyway, that file has been double encoded. Unfortunately, that was a lossy conversion, it is not possible to get 100% of the original text back.

    Run these two commands:

    iconv -c -f utf8 -t windows-1253 < problem.txt > problem-1253.txt iconv -c -f utf8 -t ISO-8859-7 < problem.txt > problem-7.txt

    iconv for Windows: http://gnuwin32.sourceforge.net/packages/libiconv.htm

    The two resulting files are in UTF-8. Open them as UTF-8 in your editor. I can recognise some words like Νίκος, Νικόλα in fragments. You have to spend some time to piece the two files together.

    As for your editor, Windows editors generally write out a BOM for UTF-8, too, even though it is only necessary for the two kinds of UTF-16 encoding. On Windows, an UTF-8 BOM has even some usefulness on normal text files. However, on Unix and on source code files in general an UTF-8 BOM is unwanted, mostly because it interferes with the shebang. You should switch off BOM for UTF-8 files in your editor, if that is not possible, get a better one.

    A reply falls below the community's threshold of quality. You may see it by logging in.