comment on

use utf8;
use open ':std', ':utf8';
use feature 'unicode_strings';

...

for my $file ( @files ) {

    open my $FH, '<', $file or die "Cannot open '$file' because: $!";
    my $size = read $FH, my $data, -s $FH or die "Cannot read from '$f
+ile' because: $!";
    $size == length( $data ) or die "Error reading from '$file'\n";
    close $FH;

...
[download]

This will always result (if $data contains Unicode characters) in dieing with the message "Error reading from '$file'\n".

Unlike in OP, here with perl v5.36.3/FreeBSD 14 (with the code below) the numbers from length & read functions match (226); do not obviously match the size in byte (282). Is it matter of the perl version; and/or, is it how one is holding the encoding layer?

perl check-size.pl ./data.mixed
$VAR1 = {
          '-s' => 282,
          'length' => 226,
          'read' => 226,
          'tr' => 226
        };
[download]


# Usage: perl5.36.3 check-size.pl ./data-mixed

use v5.32;
use warnings;
use warnings qw[ FATAL utf8 ];
use open qw[ :encoding(UTF-8) :std ];

use Data::Dumper;
local $Data::Dumper::Sortkeys = 1;

my $data = $ARGV[0]
   or die qq[Give a file to count sizes of different kinds.\n];

my %stat = ( q/-s/ => -s $data );

open( my $fh, q/</, $data )
   or die qq/Cannot open $data to read: $!/;

my $string;
$stat{q/read/} = read( $fh, $string, $stat{q/-s/} )
                  or die qq/read from \$fh failed: $!/;

close( $fh )
   or die qq/Could not close $data: $!/;

$stat{q/length/} = length( $string );
$stat{q/tr/} = $string =~ tr///c;

print( Dumper( \%stat ) );
[download]

Base64-encoded generated data -- if anyone would rather have it than run above "make-data.pl" -- with SHA256 sum of a1e81919b72403bd3c9d95979dae8d928354cf64de8d999bcad00aec4298cf76 (decoded value has the sum of 02bec20353169b58ea51558fa7d6c4316bd13f5e4a3d976484e8ed7700935962) ...

Tj9EOzg3aHosYU9NbDtrLmZQcUt4M3l3dl8hR1IyLEpvNHJCMVdRVTVUaVpkQ2pMSVM2Xz
+lFY1hl
dUhuIXRtLlkwRlZzZ3A/QS0tYk4/RDs4N2h6LGFPTWw7ay5mUHFLeDN5dwpGNVRuRUtsOG
+rwn5it
MGhQTzHvuI/ig6Mxc0lvQUx24KWsUyAgcHo2M0NiWuClr/CfpK9XVjJrR23wn5iSUnU3VW
+NI4KWv
YXFneDR34KWnWPCfmI85ZDHvuI/ig6Pwn6SjZk5Z8J+YrXlNSuClpmXgpajgpaZp4KWqdF
+FyREJG
NVRuRUtsOGrwn5itMGhQTzHvuI/ig6Mxc0lvQUx24KWsUyAgcHo2M0NiWuClr/CfpK9XVj
+IK
[download]

Update: *ugh*...

The Base64-encoded sample has different checksum after downloaded from here (than on the machine where it was posted from). The checksum for the decoded value is still the same as posted.
Removed the program that produced the sample Unicode output as I had apparently changed it (during the (re)editing of this post) such that it produced different output than the above mentioned Bas64-encoded text (for which the size result is shown).

In reply to Re: Counting bytes in a Unicode document by parv
in thread Counting bytes in a Unicode document by jwkrahn

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.