Re: Size of scalar in bytes

Replies are listed 'Best First'.
Re: Re: Size of scalar in bytes by hardburn (Abbot) on Nov 17, 2003 at 17:45 UTC
That gives the number of characters in `$scalar`, which may or may not also be the number of bytes (depending on the encoding (ASCII? UTF-8? Full Unicode?) and probably a bunch of other things I don't even know about). To get the number of bytes (as the OP asked), you need to `use bytes;` before calling `length`. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Re: Re: Size of scalar in bytes by Itatsumaki (Friar) on Nov 17, 2003 at 17:54 UTC
I'm curious: is there a way to get the number of bytes per character for the encoding in use? If so, I imagine you could do: `my $len = length($scalar) * $current_encoding_byte_per_char;` Is that plausible?	[reply] [d/l]
Re: Re: Re: Re: Size of scalar in bytes by hardburn (Abbot) on Nov 17, 2003 at 18:05 UTC
In common cases, yes. For example, you can always be sure that ASCII is 8 bits per character (well, 7 really, but nobody stores it like that in practice). It gets a little harder with weird encodings like RAD-50, where each character actually takes 5 and a third bits per character (yup, a non-integer number of bits). Once you start thinking in terms of Unicode, you should basically give up trying to figure out how many bytes a given character takes. Even UTF-8 encoding allows you to mark a character as having a variable-length number of bits. So unless you're working on the dark internals of handling Unicode, just `use bytes` (which you should probably have done even if you weren't using Unicode). If you're intrested, see http://www.sidhe.org/~dan/blog/archives/000255.html. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Re: Re: Re: Size of scalar in bytes by grantm (Parson) on Nov 17, 2003 at 18:06 UTC
is there a way to get the number of bytes per character for the encoding in use? Some encodings use a fixed bytes/character ratio, but some like UTF-8 do not. As hardburn pointed out, if `use bytes` is in effect then `length()` returns the length in bytes rather than characters: `sub byte_length { use bytes; return length $_[0]; }` [download]	[reply] [d/l] [select]