in reply to Re: Shotgun.pl - Shoots Holes in Files
in thread Shotgun.pl - Shoots Holes in Files

harangzsolt33, further to ++Fletch's spot-on reply, just a bit more detail, in case you find it useful.

From perldoc -X file test, note that the -T/-B file test operators (to tell if a file is an ASCII or UTF-8 text file or not) is done via a heuristic guess as follows:

The first block or so of the file is examined to see if it is valid UTF-8 that includes non-ASCII characters. If so, it's a -T file. Otherwise, that same portion of the file is examined for odd characters such as strange control codes or characters with the high bit set. If more than a third of the characters are strange, it's a -B file; otherwise it's a -T file.

Also, any file containing a zero byte in the examined portion is considered a binary file. (If executed within the scope of a use locale which includes LC_CTYPE, odd characters are anything that isn't a printable nor space in the current locale.) If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block. Both -T and -B return true on an empty file, or a file at EOF when testing a filehandle. Because you have to read a file to do the -T test, on most occasions you want to use a -f against the file first, as in: next unless -f $file && -T $file.

See also:

References added later

👁️🍾👍🦟