http://qs1969.pair.com?node_id=1141855


in reply to Re: What is the proper way to read non-ANSI data
in thread What is the proper way to read non-ANSI data

I redirect the output to a utf-8 file and compare it to the redirected output of dumptorrent.exe in Notepad++. The characters display as – instead of – for example.
  • Comment on Re^2: What is the proper way to read non-ANSI data

Replies are listed 'Best First'.
Re^3: What is the proper way to read non-ANSI data
by afoken (Chancellor) on Sep 14, 2015 at 19:42 UTC
    Notepad++. The characters display as –

    Notepad++, like all other programs, can only guess the encoding of plain text files. Some other file formats, like HTML, may contain more information about the encoding used. Other file formats are always encoded as UTF-8, like Java sources (IIRC).

    So, Notepad++ may just guess wrong. Check in the status bar which encoding Notepad++ guessed (probably ANSI). Use the Encoding menu to switch (not convert!) the encoding.

    A trick that works quite often is to write a Byte Order Mark ("\x{FEFF}") as first character to any file that is encoded in some Unicode encoding, including UTF-8. It is not strictly required for UTF-8, but helps most programs to guess the encoding right, including Notepad++.

    In most cases, the BOM does not hurt. An exception are any kind of unix scripts that must start with "#!" and not with a BOM. A BOM makes the script unrecognisabe to the kernel, leading to bizarre error messages.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)