Re^4: Length and Chomp ??

Some bean counting:

With a Unicode argument, length returns the number of characters in the argument. Unicode has the (no so) new / unusual / odd property that a character may be represented by more than one byte.

With a non-Unicode / pre-Unicode / legacy encoding argument, length still returns the number of characters in the argument. Those legacy encodings have the old / usual / familiar property that a character is represented by exactly one byte.

So, there is no need to remember any special cases. length always returns the character count.

Before Unicode support was added to Perl, there was no need to distinguish between byte and character, because both were equal. And as long as you don't work with Unicode, they still are. The quote from perlfunc, "if the EXPR is in Unicode, you will get the number of characters, not the number of bytes", is a hint that bytes and characters are different things when you work with Unicode, nothing more, nothing less.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Comment on Re^4: Length and Chomp ?? Select or Download Code


laziness, impatience, and hubris
	PerlMonks