in reply to Re: Re: length in bytes of utf8 string
in thread length in bytes of utf8 string

The character 0x00fc is encoded as 0xfc. Just because you like to write it with leading zeros doesn't mean that wouldn't be the wrong way to store it. Its supposed to do that.

  • Comment on Re: Re: Re: length in bytes of utf8 string

Replies are listed 'Best First'.
Re: Re: Re: Re: length in bytes of utf8 string
by mrd (Beadle) on Jun 27, 2003 at 10:21 UTC
    I don't understand what you mean.

    Leading zeros or not, the length is wrong!

      That's because "\374" eq "\xfc" eq "\x{00fc}" eq chr(252) eq chr(0xfc) eq chr(0374). That character is one byte long.

        Now I see what you mean. Is fits in one byte. But perl uses 2 bytes for it (because this is not ASCII but rather german umlaute):

        use Devel::Peek; use bytes; $c = pack ("U", 0xfc); print Dump($c); print length($c);
        Output:
        SV = PV(0x15d559c) at 0x1a45848 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x15dd39c "\303\274"\0 [UTF8 "\x{fc}"] CUR = 2 LEN = 3 2

        But it seems that representing it as \x{fc} doesn't work, as somebody else already noticed.