in reply to Re^2: create clone script for utf8 encoding
in thread create clone script for utf8 encoding

use <pre> instead of <code> for unicode

Replies are listed 'Best First'.
Re^4: create clone script for utf8 encoding
by Aldebaran (Curate) on Dec 17, 2018 at 23:42 UTC
    use pre instead of code for unicode

    Sure enough...this is a repeat of the diff command with pre tags:

    $ echo Привет >3.file
    $ diff 1.file 3.file
    1c1
    < ������
    ---
    > Привет
    $ 
    
    

    Hmmm, well there it is. I tried pre tags in the writeup but must not have pasted it in and previewed correctly. There is something to learn from seeing the numerical representations of these characters. Indeed, I was surprised that 65533 * 6 was what diff thought 1.file was. It is the unicode replacement character: U+FFFD. Further reading and clarification here: unicode specials

    How did you get single code and pre tags to display (surrounded by <>) and not foul the legibility?

    Also, is there a way to employ the diff command so that the equality in these files could be established? (not essential or vital to this coding task)

      I tried pre tags in the writeup but must not have pasted it in and previewed correctly.
      PerlMonks engine automatically replaces all symbols not representable in ASCII by their HTML entity codes: ы&#1099;. The <code> are special non-HTML tags that don't allow HTML entities inside them to be interpreted, but the transformation still takes place. (How did I write that? <tt>ы</tt> &rightarrow; <c>ы</c> and let PerlMonks make the replacement, knowing that the entity code inside <tt>...</tt> will be interpreted back into ы, while the one inside <c>...</c> won't. How did I write what I just wrote? Lots of &lt;s and <code> = <c> equivalence.)
      It is the unicode replacement character: U+FFFD.
      The replacement character is what happens when your terminal emulator tries to decode KOI8-R-encoded bytes as UTF-8 and fails. The actual output of diff contains both KOI8-R- and UTF-8- encoded bytes and can be decoded as KOI8-R:
      $ diff 1.file 3.file | iconv -f koi8-r
      1c1
      < Привет
      ---
      > п÷я─п╦п╡п╣я┌

        PerlMonks engine automatically replaces all symbols not representable in ASCII by their HTML entity codes

        Nit: It's actually the browser doing that, and it's for characters outside of cp1252 (not ASCII).

        PerlMonks pages are returned as cp1252, so the browser knows that forms must be submitted using cp1252. Characters outside of cp1252 (e.g. "ы") can't be submitted, but rather than throwing an error if such characters are provided, the browser provides an HTML encoding of the character (e.g. "&#1099;") instead just in case that works. Of course, it doesn't work within code tags because PerlMonks escapes "&" in code tags so that it shows up as "&".