in reply to Perl -T vs Mime::Types
-T tests just a few bytes of the file (see -X). File::Type just guesses a file type by searching for a few magic numbers, like file. Both can't be reliable.
If you want to check for a file that contains only ASCII characters, you have to check the entire file. There is no other way.
I guess you also want to check for a sane file size, perhaps some hundred kBytes or a few MBytes. On a modern computer, slurping the entire file with that limitation is no big problem.
You may want something like this (untested):
-f $filename or die "$filename is not a file"; (-s _ < 100_000) or die "$filename is too large"; # avoid a second sta +t() syscall by using the special handle "_" my $blob=do { open my $f,'<:raw',$filename or die "Can't open $filename: $!"; local $/; # slurp mode <$f>; # slurp # leaving the do block auto-closes $f }; # Accept only CR, LF, TAB, and printable characters from 0x20 to 0x7E. $blob=~/^[\r\n\t\x20-\x7E]*$/s or die "$filename is not ASCII";
If you want significantly larger files, you have to read smaller blocks (perhaps 1 MByte each), and check each block for its "ASCIIness". Abort at the first failed block.
Alexander
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Perl -T vs Mime::Types
by AnomalousMonk (Archbishop) on Sep 20, 2017 at 00:43 UTC |