in reply to Re^3: Best Way to Get Length of UTF-8 String in Bytes?
in thread Best Way to Get Length of UTF-8 String in Bytes?

I see use bytes; without any utf8::upgrade or utf8::downgrade, and that usually indicates code that suffers from "The Unicode Bug".

sub bytelen(_) { require bytes; return bytes::length($_[0]); }

should be

sub utf8len(_) { utf8::upgrade($_[0]); require bytes; return bytes::length($_[0]); }

Or the same without bytes:

sub utf8len(_) { utf8::upgrade($_[0]); Encode::_utf8_off($_[0]); my $utf8len = length($_[0]); Encode::_utf8_on($_[0]); return $utf8len; }

Update: Added non-bytes alternative.

Replies are listed 'Best First'.
Re^5: Best Way to Get Length of UTF-8 String in Bytes?
by tchrist (Pilgrim) on Apr 24, 2011 at 06:01 UTC
    And just which part of
    That assumes that the strings are Unicode strings with their UTF‑8 flags on.
    didn’t you understand?
      FWIW, if it is easy to check, code might as well check instead of merely assuming :)
      That doesn't affect anything I said.