in reply to Re: Length and Chomp ??
in thread Length and Chomp ??

the length function returns the length (in bytes)

No, it does not. It returns the number of characters found in the argument (or $_ when called without an argument). See length.

Note that length counts every character, including control characters like CR, LF, and TAB, and not just printable characters.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^3: Length and Chomp ??
by biohisham (Priest) on Aug 22, 2009 at 18:53 UTC
    Exact and precise it is when you said that length does count every character including control characters. But citing the same perldoc for length I have "If the EXPR is in Unicode, you will get the number of characters, not the number of bytes." which means, if we look at it the other way around and negate this statement we would reach to "if the EXPR was otherwise not in Unicode, a strong implication is embedded that we'd get its length in bytes instead of characters".

    Update: I had the notion that one character can be represented by one byte in Programming, this has been more solidified after afoken gracious contribution underneath.

    Hence, what you said, that we'd get the number of characters holds true for Unicode values, and what I replied when I said that length is byte length for characters not in Unicode holds true too since characters are bytes for those values not in unicode :).


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.

      Some bean counting:

      With a Unicode argument, length returns the number of characters in the argument. Unicode has the (no so) new / unusual / odd property that a character may be represented by more than one byte.

      With a non-Unicode / pre-Unicode / legacy encoding argument, length still returns the number of characters in the argument. Those legacy encodings have the old / usual / familiar property that a character is represented by exactly one byte.

      So, there is no need to remember any special cases. length always returns the character count.

      Before Unicode support was added to Perl, there was no need to distinguish between byte and character, because both were equal. And as long as you don't work with Unicode, they still are. The quote from perlfunc, "if the EXPR is in Unicode, you will get the number of characters, not the number of bytes", is a hint that bytes and characters are different things when you work with Unicode, nothing more, nothing less.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)