Juerd has asked for the wisdom of the Perl Monks concerning the following question:

I send you this node in order to have your advi...nevermind.

Is there an efficient way to get a strings byte length instead of character length? The two solutions I came up with myself are rather slow and rely on non-utf8 string weirdness in Perl 5.6.

# Prototyped ($), just like length. Please do not bitch about that. sub byte_length ($) { my ($copy) = @_; my $counter; $counter++ while $copy =~ s/.//s; return $counter; } sub byte_length ($) { return scalar split //, $_[0]; }

Other versions of byte_length are appreciated, 'cause I don't like abusing bugs. (Especially when it's already fixed in a new version :)

++ vs lbh qrpbqrq guvf hfvat n ge va Crey :)
Nabgure bar vs lbh qvq fb jvgubhg ernqvat n znahny svefg.
-- vs lbh hfrq OFQ pnrfne ;)
    - Whreq

Replies are listed 'Best First'.
Re: Byte length
by Ido (Hermit) on Mar 04, 2002 at 19:49 UTC
    use the bytes pragma..
    $x=chr(536); print "Char length: ".length($x)."\n"; print "Bytes length: ".bytes_length($x)."\n"; sub bytes_length($){ use bytes; length $_[0]; }
      Thanks! I didn't know about the bytes pragma :(

      ++ vs lbh qrpbqrq guvf hfvat n ge va Crey :)
      Nabgure bar vs lbh qvq fb jvgubhg ernqvat n znahny svefg.
      -- vs lbh hfrq OFQ pnrfne ;)
          - Whreq
      

Re: Byte length
by Matts (Deacon) on Mar 05, 2002 at 08:13 UTC
    The correct way is to de-unicode the string into bytes, and take the length of that:

    sub byte_length { return length pack("C0A*", shift); }

    Don't try this on 5.6.0 - like most Unicode things it's probably quite broken there.

    Update: Oops, looks like "use bytes" is officially the right way to do it. It's even in the "bytes" man page. You can also use bytes::length() directly if you've loaded the bytes module previously.

Re: Byte length
by webadept (Pilgrim) on Mar 05, 2002 at 08:27 UTC
    This is kinda odd.. I was looking at the Camel book in the Length area, and my copy says that Length returns the size of a scalar in bytes.???

    --- snippage from the book ---

    length

    length EXPR

    This function returns the length in bytes of the scalar value EXPR. If EXPR is omitted, the function returns the length of $_, but be careful that the next thing doesn't look like the start of an EXPR, or the tokener will get confused. When in doubt, always put in parentheses.

    Do not try to use length to find the size of an array or hash. Use scalar @array for the size of an array, and scalar keys %hash for the size of a hash. (The scalar is typically dropped when redundant, which is typical.)

    --- unsippage --

    Just thought I would drop this into the fray, if fray there would be.

    webadept.net

      You have an older Camel.

      The latest edition says ...

      length
          length EXPR
          length

      The function returns the length in characters of the scalar value EXPR.

      (Emphasis mine.)

      And then goes on to say ...

      To find the length of a string in bytes rather than characters, say:

          $blen = do { use bytes; length $string; };

      or:

          $blen = bytes::length($string); # must use bytes first

      perldoc -f length in recent Perls (5.6+) say something similar thanks to the introduction of Unicode/UTF8 support.

          --k.