in reply to Re^2: incorrect length of strings with diphthongs
in thread incorrect length of strings with diphthongs

Yes, I'd say it's similar with the "ethnic" modifiers of face emojis.

But my expectation is that those modifiers don't count as character and have length 0, i.e. "Hütte" should have length 5 in both incarnations.

> how length() is implemented?

I may be wrong tho...

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

  • Comment on Re^3: incorrect length of strings with diphthongs

Replies are listed 'Best First'.
Re^4: incorrect length of strings with diphthongs
by choroba (Cardinal) on Aug 30, 2022 at 17:48 UTC
    #!/usr/bin/perl use strict; use feature qw{ say }; use warnings; use Unicode::Normalize qw{ normalize }; use Unicode::GCString; my $char = "\N{LATIN SMALL LETTER U WITH DIAERESIS}"; binmode *STDOUT, ':encoding(UTF-8)'; for (qw( D C )) { my $n = normalize($_, $char); my $gcs = 'Unicode::GCString'->new($n); say join ' ', length($n), $n =~ s/(\X)/$1/g, $1, $gcs->chars, $gcs->columns, $gcs->length; }
    2 1 ü 2 1 1
    1 1 ü 1 1 1
    

    Update: Added the output.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Interesting, looks like code.

      I might even be able to install those modules and try to understand the output you didn't provide (yet)!

      ;-P

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Check the update :-)

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^4: incorrect length of strings with diphthongs
by LanX (Saint) on Aug 30, 2022 at 20:04 UTC
    > I may be wrong tho...

    I certainly am...

    #!/usr/bin/perl use v5.12; use strict; use utf8; use Devel::Peek; my $trema = "\N{COMBINING DIAERESIS}"; binmode *STDOUT, ':encoding(UTF-8)'; my $huette = "Hu${trema}tte"; Dump $huette; say "$huette\'s length: ". length($huette);

    SV = PV(0x25f4a58) at 0x25266b8 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x28da368 "Hu\314\210tte"\0 [UTF8 "Hu\x{308}tte"] CUR = 7 LEN = 10 Hütte's length: 6
    That's how it looks like without codetags:

    Hütte's length: 6

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery