in reply to uc and German eszett "ß"

Mystery solved: After reading up the docs for fc I realized that Perl implements this Unicode standard for cases:

The docs also claim that fc is the only acceptable solution for comparing "case insensitive strings".

Unfortunately are we working with an IBM product°, which sees a difference between fields named "Straße" and "Strasse", while fc (and German readers) will consider them identical.

Now regarding uc of "ß", that's to be decided between the Unicode committee and German language authorities.

(this is not so much a problem for German readers but for CS and "equality of strings")

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

update

°) which is surprisingly fucked up in this respect https://www.ibm.com/docs/en/cognos-tm1/10.2.2?topic=pitf-unicodeupperlowercase-1

  • Comment on Re: uc and German eszett "ß" (Unicode standard)

Replies are listed 'Best First'.
Re^2: uc and German eszett "ß" (Unicode standard)
by choroba (Cardinal) on Feb 02, 2022 at 12:10 UTC
    Unicode is hard. IBM is not to blame, we're having problems with ICU and Postgres, too.
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      > Unicode is hard.

      I agree, but what they did is even "harder".

      Identifiers with unicodes are case sensitive, without are case insensitive.

      update

        TM1 treats ASCII object names as case-insensitive; the element name SALES is equivalent to sales. A reference to either SALES, sales, or even SaLeS is considered to be a reference to a single element. Similarly, the cube name Projections is equivalent to PROJECTIONS.

        However, Unicode object names are not treated as case-insensitive. Consequently, a server can contain two identically named objects that varied only in case. For example, the elements NEMÈIJA and nemèija can exist in a single dimension, and each is considered a unique element

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re^2: uc and German eszett "ß" (Unicode standard)
by ikegami (Patriarch) on Feb 04, 2022 at 15:22 UTC

    //i is equally acceptable.

    That's why qr/(?<=ss)x/ui fails in versions of Perl without experimental (and buggy) variable look-behinds. /u is added by use feature qw( unicode_strings ); since 5.14, and thus by use v5.14; and higher.

    Also,

    $ perl -Mre=debug -e'qr/[abc\xDF]/ui' Compiling REx "[abc\xDF]" ~ tying lastbr BRANCH (4) to ender TAIL (15) offset 11 ~ tying lastbr BRANCH (1) to ender END (16) offset 15 Final program: 1: BRANCH (4) 2: EXACTFUP <ss> (16) 4: BRANCH (FAIL) 5: ANYOF[ABCabc] (16) 15: TAIL (16) 16: END (0) minlen 1 Freeing REx: "[abc\xDF]"
      I think fc and //i should have the same inner semantics.

      But //i doesn't help in our case since we needed hash-keys.

      Anyway this was already solved on another level ...

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        I think fc and //i should have the same inner semantics.

        They do. That's literally the point of the message to which you replied.

        Anyway this was already solved on another level ...

        I wasn't proposing a solution; I was contradicting the claim that «fc is the only acceptable solution for comparing "case insensitive strings".»