[\x{0} -\x{7e}]

##</code><code>##

 Code Points            1st Byte  2nd Byte  3rd Byte  4th Byte

   U+0000..U+007F       00..7F
   U+0080..U+07FF       C2..DF    80..BF
   U+0800..U+0FFF       E0        A0..BF    80..BF
   U+1000..U+CFFF       E1..EC    80..BF    80..BF
   U+D000..U+D7FF       ED        80..9F    80..BF
   U+D800..U+DFFF       ******* ill-formed *******
   U+E000..U+FFFF       EE..EF    80..BF    80..BF
  U+10000..U+3FFFF      F0        90..BF    80..BF    80..BF
  U+40000..U+FFFFF      F1..F3    80..BF    80..BF    80..BF
 U+100000..U+10FFFF     F4        80..8F    80..BF    80..BF


Note the A0..BF in U+0800..U+0FFF, the 80..9F in U+D000...U+D7FF, the 90..BF in U+10000..U+3FFFF, and the 80...8F in U+100000..U+10FFFF. The "gaps" are caused by legal UTF-8 avoiding non-shortest encodings: it is technically possible to UTF-8-encode a single code point in different ways, but that is explicitly forbidden, and the shortest possible encoding should always be used. So that's what Perl does. 

Another way to look at it is via bits: 

 Code Points                    1st Byte   2nd Byte  3rd Byte  4th Byte

                    0aaaaaaa     0aaaaaaa
            00000bbbbbaaaaaa     110bbbbb  10aaaaaa
            ccccbbbbbbaaaaaa     1110cccc  10bbbbbb  10aaaaaa
  00000dddccccccbbbbbbaaaaaa     11110ddd  10cccccc  10bbbbbb  10aaaaaa

##</code><code>##

my $patttern ={
  [\x{c2-df}\x{80-bf}] |
  [\x{e0-ff}\x{a0-bf}\x{80-bf}] |
  [\x{e1-ef}\x{80-bf}\x{80-bf}] |  #I leave off the rest here
};

##</code><code>##

while(<FILE>){
   if (/$pattern/ox)  #options to allow whitespace in the pattern, and to prevent the compiler from recalculating it at every run)

     $chars{$&}++; #$& is the part of the string that actually matches the pattern
}

foreach (keys %chars){
  print "unpack 'U*', $_ matched $chars{$_} times.\n";

#I am unsure on the "unpack"