Re: The “real length" of UTF8 strings

Can you used a regex to identify the the characters which are double length ? Something like:

  print xlen("(\x{5fcd}\x{65e0}\x{53ef}\x{5fcd})"), "\n" ; ;

  sub xlen {
    my ($s) = @_ ;
    my $l = length($s) ;
    while ($s =~ m/[\x{5000}-\x{6FFF}]/g) { $l++ ; } ;
    return $l ;
  } ;
[download]

perhaps ?

Or:

  print ylen("(\x{5fcd}\x{65e0}\x{53ef}\x{5fcd})"), "\n" ; ;

  sub ylen {
    my ($s) = @_ ;
    return length($s) + ($s =~ tr/[\x{5000}-\x{6FFF}]//) ;
  } ;
[download]

which avoids running a while loop and may or may not be faster.

Comment on Re: The “real length" of UTF8 strings Select or Download Code