in reply to regex: searching for multi-byte characters

I don't understand multibyte characters that well either but I've had to deal with them a little. So this may answer your question or it may just show that I'm more confused than you are. If it's true that "regular expressions match characters instead of bytes", can you do something like this?:
while (<FILE) { while ($_ =~ /\G(.)/g) { my $char = $1; # code here to check whether $char is one you want to tally... } }
The code to check for what you want to tally might look like this:
my $u = unpack('U', $char); $tally{$char}++ if ($u > 128);