in reply to How does the built-in function length work?

No, it works with decoded strings (i.e. it doesn't have to guess encodings).

  • Comment on Re: How does the built-in function length work?

Replies are listed 'Best First'.
Re^2: How does the built-in function length work?
by PerlOnTheWay (Monk) on Dec 02, 2011 at 14:12 UTC

    But I don't have to decode the string before using it

      Maybe in your string then, the number of octets and number of characters is the same?

      The following shows that Perl does not guess the encoding of strings but assumes it:

      # Perl assumes it's a Latin-1 string > perl -MEncode -wle print+length(qq(\x{c3}\x{a4})) 2
      # Perl gets told to decode the string from UTF-8 > perl -MEncode -wle print+length(decode('UTF-8',qq(\x{c3}\x{a4}))) 1
      # My terminal is Latin-1, which happens to match Perls default assumpt +ion > perl -MEncode -wle print(length(decode('Latin-1',qq(ä)))) 1

      Update: choroba pointed out that I mispasted the second example - now corrected.

      It depends... sometimes you do have to decode them, sometimes you don't, because Perl (or some module etc.) has already done it for you.

      In any case, for Perl to be able to work with character strings (as opposed to byte/octet strings), the string must have been decoded somehow into Perl's internal Unicode representation.

      It also works on encoded strings, that is to say then it counts octets.

      Be aware of what you are feeding to the length function, you must keep track of the state of encoding yourself because Perl won't.