moseley has asked for the wisdom of the Perl Monks concerning the following question:

I know I can use:
$byte_length = do { use bytes; length $str };
with newer Perls. But I need to use that in a script that could be used with $] >= 5.00503 (or maybe even before thanks to some ISPs). So, how do I know when to use it (which versions) and also how to code it -- can't wrap "use bytes" in an eval.

This thread discusses some other methods to detecting the string length. Although I'd rather get direct access like Devel::Peek does instead of counting bytes in a loop.

Replies are listed 'Best First'.
Re: Portable string length in bytes
by bart (Canon) on Mar 31, 2003 at 20:52 UTC
    Try
    my $bytecount = length pack 'C0a*', $string;
    The pack, when used this way, will wrap the original bytes in a string that, in 5.6 and above, is marked as a byte string — not UTF-8. So length will do the right thing. In earlier versions, all strings are byte strings, anyway.

    In a similar manner, one can use 'U0' on 5.6 and above, to mark the resulting string as UTF-8 — again, without altering the bytes. So it's very similar to the UTF-8-twiddling routines _utf8_on and _utf8_off in Encode with 5.8 and above, except that this here is a function. For more info, see the entry on pack in a recent version of perlfunc.

      Thanks, that looks like it will work.

      Something rubs me the wrong way about using pack, though. I'm getting the length of a document, which could be a very large document -- so seems very inefficient (compred to just somehow fetching the current length from the SV).

      I need to find time to learn more about utf-8/unicode support in Perl. For example, looking at Devel::Peek output the SV has the CUR length, which is what I want. I wonder if length() on a UTF string requires counting up the chars each time it's used or if the character length is also stored in the SV.

      Thanks for the help,

      BTW -- is there any trick for conditionally using a pragma like use bytes; in an older Perl?

        BTW -- is there any trick for conditionally using a pragma like use bytes; in an older Perl?
        Sure. Create a bogus module file bytes.pm and place it somewhere in @INC. All it needs to contain, is something like "1;", so it loads OK. Older perls always work in bytes, anyway.
Re: Portable string length in bytes
by pg (Canon) on Mar 31, 2003 at 20:01 UTC
    unpack with "b*" could be a solution:
    use strict; my $a = "1234".chr(4000)."abcd"; { use bytes; print length($a), "\n"; #11 bytes } print length($a), "\n"; #9 chars print length(unpack("b*", $a)) / 8; # 11 bytes
    I tried with 5.8 seems fine. However, I leave two things open:
    1. whether unpack b* is supported in your old versions? (I cannot test, as I don't have those old versions)
    2. for long strings, I guess performance would be an issue.