Thanks to all of you for the responses.
I'm trying to re-learn, eventually master, programming in Ada.
As simple learning exercises, I'm trying to perform **minimal** emulation of several UNIX/Linux tools/utilities (such as grep, cat, head, wc, etc) using Ada.
I'd like to avoid disabling my terminal by using any of these emulations on a "binary" file, so I'd like to use "file" or "perl -T" in Ada via a system call.
I'm not good enough at Ada at this point to enable my own "-T" file operator, but that's one of the many things I'm aiming for.
I'd sure like to know how Perl determines ("heuristic guess" says perldoc!) that the file is or is not a binary so that I can do that in Ada.
I like the warning I get when using "less" with a file with binary content; it warns me! How does it determine that?
The problem I still have with the Perl one-liner is that I don't know how to embed the filename I want to query inside double quotes or how to escape the slash path separators so that it is not interpreted as a regex.
The reason why I went with the "perl -T" route over the "file" route was that I wanted a simple "text" or "binary" response, nothing I needed to parse further.
Thanks,
Ken Wolcott | [reply] |
$ pwd
/home/ken/tmp
$ perl -E 'say +(qw{binary text})[-T $ARGV[0]]' /usr/bin/perl
binary
$ perl -E 'say +(qw{binary text})[-T $ARGV[0]]' file_perl
text
$ perl -E 'say +(qw{binary text})[-T $ARGV[0]]' ./file_perl
text
$ perl -E 'say +(qw{binary text})[-T $ARGV[0]]' ~/tmp/file_perl
text
$ perl -E 'say +(qw{binary text})[-T $ARGV[0]]' ~ken/tmp/file_perl
text
$ perl -E 'say +(qw{binary text})[-T $ARGV[0]]' ../tmp/file_perl
text
$ perl -E 'say +(qw{binary text})[-T $ARGV[0]]' ../../../usr/bin/perl
binary
Note that the commands are identical throughout.
You can supply a filename or an absolute/relative pathname in whatever format you want (as an argument to that command).
In case you were wondering, file_perl still exists from earlier examples:
$ pwd
/home/ken/tmp
$ ls -l file_perl
-rw-r--r-- 1 ken None 105 Aug 13 23:46 file_perl
$ file file_perl
file_perl: ASCII text
$ cat file_perl
/usr/bin/perl: PE32+ executable (console) x86-64 (stripped to external
+ PDB), for MS Windows, 11 sections
| [reply] [d/l] [select] |
I'd sure like to know how Perl determines ("heuristic guess" says perldoc!) that the file is or is not a binary so that I can do that in Ada.
The method used is described here (perldoc functions -X):
The -T and -B tests work as follows.
The first block or so of the file is examined to see if it is valid UTF-8 that includes non-ASCII characters.
If so, it's a -T file.
Otherwise, that same portion of the file is examined for odd characters such as strange control codes or characters with the high bit set.
If more than a third of the characters are strange, it's a -B file; otherwise it's a -T file.
Also, any file containing a zero byte in the examined portion is considered a binary file.
(If executed within the scope of a use locale which includes LC_CTYPE,
odd characters are anything that isn't a printable nor space in the current locale.)
If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block.
Both -T and -B return true on an empty file, or a file at EOF when testing a filehandle.
Because you have to read a file to do the -T test,
on most occasions you want to use a -f against the file first, as in next unless -f $file && -T $file.
To see the gory details of the implementation,
search for pp_fttext in pp_sys.c in the Perl source code (which is also used for pp_ftbinary).
Command Line Examples
As you can see from the bash command line
examples below, the Linux file command is
much more complex and sophisticated in that it recognizes a number of specific file formats ... while
perl's -B file test just crudely guesses whether a file is binary or text based on a simple heuristic:
$ perl -e '-B q{perl-5.38.0.tar.gz} and warn "binary file\n"'
binary file
$ file perl-5.38.0.tar.gz
perl-5.38.0.tar.gz: gzip compressed data, last modified: Sun Jul 2 22
+:26:17 2023, max compression, from Unix, original size modulo 2^32 97
+505280
$ perl -e '-B q{/usr/bin/perl} and warn "binary file\n"'
binary file
$ file /usr/bin/perl
/usr/bin/perl: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV)
+, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildI
+D[sha1]=6ecb6a6b0956a41de73c75facebcb500b525b860, for GNU/Linux 3.2.0
+, stripped
See Also
Updated: Added "Command Line Examples" and "See Also" sections.
| [reply] [d/l] [select] |