in reply to Dealing with non-ascii characters when reading file.
(That's just a toy version to try it out on files that aren't seriously large. I'd do it differently for general use.)#!/usr/bin/perl use strict; use warnings; die "Usage: $0 file.name\n" unless ( @ARGV == 1 and -f $ARGV[0] ); open( FH, shift ); binmode FH; $/ = undef; $_ = <FH>; my %char_hist; for my $c ( split // ) { $char_hist{ sprintf( "%02x", ord( $c )) }++; } for my $c ( sort keys %char_hist ) { printf "%s\t%d\n", $c, $char_hist{$c}; }
It's sometimes surprising what you can learn about a file just by looking at a histogram of its byte values - seeing which values occur, and which ones don't.
(If you happen to know that a file contains utf8-encoded text, you can learn a lot by looking at a histogram of its Unicode characters - I posted a script for that too: unichist -- count/summarize characters in data.
|
|---|