in reply to Portable string length in bytes

Try
my $bytecount = length pack 'C0a*', $string;
The pack, when used this way, will wrap the original bytes in a string that, in 5.6 and above, is marked as a byte string — not UTF-8. So length will do the right thing. In earlier versions, all strings are byte strings, anyway.

In a similar manner, one can use 'U0' on 5.6 and above, to mark the resulting string as UTF-8 — again, without altering the bytes. So it's very similar to the UTF-8-twiddling routines _utf8_on and _utf8_off in Encode with 5.8 and above, except that this here is a function. For more info, see the entry on pack in a recent version of perlfunc.

Replies are listed 'Best First'.
Re: Re: Portable string length in bytes
by Anonymous Monk on Mar 31, 2003 at 21:14 UTC
    Thanks, that looks like it will work.

    Something rubs me the wrong way about using pack, though. I'm getting the length of a document, which could be a very large document -- so seems very inefficient (compred to just somehow fetching the current length from the SV).

    I need to find time to learn more about utf-8/unicode support in Perl. For example, looking at Devel::Peek output the SV has the CUR length, which is what I want. I wonder if length() on a UTF string requires counting up the chars each time it's used or if the character length is also stored in the SV.

    Thanks for the help,

    BTW -- is there any trick for conditionally using a pragma like use bytes; in an older Perl?

      BTW -- is there any trick for conditionally using a pragma like use bytes; in an older Perl?
      Sure. Create a bogus module file bytes.pm and place it somewhere in @INC. All it needs to contain, is something like "1;", so it loads OK. Older perls always work in bytes, anyway.