in reply to Re: Re: Re: length in bytes of utf8 string
in thread length in bytes of utf8 string

I don't understand what you mean.

Leading zeros or not, the length is wrong!

  • Comment on Re: Re: Re: Re: length in bytes of utf8 string

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: length in bytes of utf8 string
by diotalevi (Canon) on Jun 27, 2003 at 10:49 UTC

    That's because "\374" eq "\xfc" eq "\x{00fc}" eq chr(252) eq chr(0xfc) eq chr(0374). That character is one byte long.

      Now I see what you mean. Is fits in one byte. But perl uses 2 bytes for it (because this is not ASCII but rather german umlaute):

      use Devel::Peek; use bytes; $c = pack ("U", 0xfc); print Dump($c); print length($c);
      Output:
      SV = PV(0x15d559c) at 0x1a45848 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x15dd39c "\303\274"\0 [UTF8 "\x{fc}"] CUR = 2 LEN = 3 2

      But it seems that representing it as \x{fc} doesn't work, as somebody else already noticed.

        Perl is supposed to treat 0xFC as a single byte. UTF-8 is the oddball here.