in reply to Re^2: create clone script for utf8 encoding
in thread create clone script for utf8 encoding

Am I correct that what my OS is telling me is its best guess as to how to interpret this file and have it make any sense?

Yes, with the emphasis being that it's just a guess.

the OS thinks the doc is utf8 if there are utf8 non-ascii characters in it.

Yes, and it might be important to note that there are certain sequences of bytes that are not valid UTF-8 (see e.g. UTF-8), which means that in some cases, it's possible to differentiate between random bytes and valid UTF-8 text. Also, just to nitpick a little, it's not the OS guessing the file's encoding, it's the file tool.

I did nothing with the #.haukex scripts to change from us-ascii to utf8 but begin to include cyrillic characters

Note that if you have a file that is originally ASCII and you add non-ASCII characters to it, it's up to the editor to choose which encoding it will use when saving the file. Many editors will default to UTF-8, but some may not!

with pre tags

You may have noticed that when using <pre> tags, you have to escape square brackets, [ is &#91; and ] is &#93;.

Update: Improved wording of first paragraph.

Replies are listed 'Best First'.
Re^4: create clone script for utf8 encoding
by Aldebaran (Curate) on Dec 19, 2018 at 21:23 UTC
    it might be important to note that there are certain sequences of bytes that are not valid UTF-8 (see e.g. UTF-8), which means that in some cases, it's possible to differentiate between random bytes and valid UTF-8 text.

    I see.

    Also, just to nitpick a little, it's not the OS guessing the file's encoding, it's the file tool.

    Thank you for the delousifying reference at file. I pulled out what I thought was relevant. I've "known" this before, but if you get behind on reading, things change:

    You may have noticed that when using pre tags, you have to escape square brackets

    I do now. Life is like a box of chocolates with pre tags for this particular forrest gump. The engine that parses the xml is gonna look at [ ] and create a hyperlink, isn't it? I think I'm gonna go back to code tags, even when content has cyrillic. Others can make a clean download without having to copy and paste off the screen.