in reply to The “real length" of UTF8 strings
Can you used a regex to identify the the characters which are double length ? Something like:
perhaps ?print xlen("(\x{5fcd}\x{65e0}\x{53ef}\x{5fcd})"), "\n" ; ; sub xlen { my ($s) = @_ ; my $l = length($s) ; while ($s =~ m/[\x{5000}-\x{6FFF}]/g) { $l++ ; } ; return $l ; } ;
Or:
which avoids running a while loop and may or may not be faster.print ylen("(\x{5fcd}\x{65e0}\x{53ef}\x{5fcd})"), "\n" ; ; sub ylen { my ($s) = @_ ; return length($s) + ($s =~ tr/[\x{5000}-\x{6FFF}]//) ; } ;
|
|---|