in reply to uc and German eszett "ß"

G'day Rolf,

Here's all of the variations that I could think of:

$ perl -v | head -2 | tail -1 This is perl 5, version 34, subversion 0 (v5.34.0) built for cygwin-th +read-multi $ echo $LANG en_AU.UTF-8 $ alias perlu alias perlu='perl -Mstrict -Mwarnings -Mautodie=:all -Mutf8 -C -E'
$ perlu '
    say "$_ -> ", ord($_) for
        "ß", "\Uß", uc("ß"),
        "ẞ", "\Lẞ", lc("ẞ"),
        "\Fß", fc("ß"),
        "\Fẞ", fc("ẞ");
'
ß -> 223
SS -> 83
SS -> 83
ẞ -> 7838
ß -> 223
ß -> 223
ss -> 115
ss -> 115
ss -> 115
ss -> 115

From "Re^2: uc and German eszett "ß"":

"Furthermore is ẞ a display problem of the monastery's code blocks, the character prints well inside my emacs."

When using non-ASCII characters, I replace "code" with "pre" and "c" with "tt". I think the problem is more to do with PM's encoding than a specific code block issue; for example, you'll get the same rendering of entities, rather than characters, in paragraph text. Someone more knowlegeable may have a better (more complete) answer to that.

Update: I removed four instances of ken@titan ~/tmp that preceded each of the commands above. I had originally just done a copy-paste from my screen, but that information is irrelevant clutter.

— Ken

Replies are listed 'Best First'.
Re^2: uc and German eszett "ß"
by cavac (Prior) on Feb 02, 2022 at 14:01 UTC

    The HTML specs are not very specific about how "code" vs "pre" really works. It's mostly on the order of "Dear Browser! FYI, this part is some sort of program code thing. Please do something about it if you want (but you are not required to)." Basically, the code tag is says "here is some text using the "monospace" font family"

    The "pre" is just as vague tag preserves linebreaks and other whitespace characters and uses a fixed-width font. And again, that is pretty much all that the standard says about that, as far as i can tell.

    perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
      HTML spec is irrelevant here, PerlMonks interprets <code> in its own way.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      As ++choroba correctly points out, PM <code> is not HTML <code>. See "Markup in the Monastery".

      The PM <code> tag provides some conveniences. It automatically handles certain special characters; for instance, you can paste code with $x < $y without having to manually change that to $x &lt; $y. It also adds the "download" link for blocks of code.

      The <code> and <c> are interchangeable. I usually use the former for blocks and the latter for inline: that's just a personal preference.

      With <pre> and <tt>, you will need to manually edit special characters; accordingly, I try to keep these as small as possible. You also don't get the "download" link.

      — Ken