Re: Counting bytes in a Unicode document

Every time I see people using UTF-8 I think they massively over complicate it. All you usually need to do is use decode() and encode() on your scalars.

Example: read the bytes raw, then get the number of Unicode characters by decoding the utf-8 into a scalar of unicode characters:

use Encode;
open my $FH, '<:raw', $utf8file;
read $FH, my $data, -s $FH;
print length decode("utf8",$data);
[download]

Comment on Re: Counting bytes in a Unicode document Download Code