in reply to Counting bytes in a Unicode document
Every time I see people using UTF-8 I think they massively over complicate it. All you usually need to do is use decode() and encode() on your scalars.
Example: read the bytes raw, then get the number of Unicode characters by decoding the utf-8 into a scalar of unicode characters:
use Encode; open my $FH, '<:raw', $utf8file; read $FH, my $data, -s $FH; print length decode("utf8",$data);
|
|---|