in reply to file type

File::Basename and a hash like
%extensions = ( 'c' => 'C Source', 'cpp' => 'C++ Source', 'pl' => 'Perl Script', 'dat' => 'Data (Text?) File', 'csv' => 'Comma Separated Values', 'txt' => 'Text' );
are what I would use. If you can't trust people to use the extensions appropriately, there is no other Unix way to differentiate (that I know of). Files are either binary or ascii (Update: AgentM has a good point below, and I don't want to belabor the issue). End of story. If you are intrepid, I suppose you could check the C source for things like #include <stdio.h> just for kicks, but I wouldn't rely on that any more than I would rely on #!/usr/bin/perl to indicate a Perl script.

Replies are listed 'Best First'.
Re: Re: file type
by AgentM (Curate) on Feb 09, 2001 at 03:53 UTC
    Actually, that's not entirely correct. Under UNIX, a file is a file is a file. There is no differentiation between binary and ASCII- that's why the standard UNIX streams have no need for binmode. A simple stream of bits is what you get when you read from a stream. DOS-based systems are the ones that require binmode and they differentiate between ASCII and binary even in the characters that are used in similar representations. PAGERs like to make a good guess and warn you if you try to PAGER an actual binary file but it does this by reading the file a bit and checking for non-ascii character bytes. In fact, under UNIX, it is entirely impossible to tell what a stream-type file is (devices and ttys are easy to check for).
    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
      And another point, which I came across while reading Learning Perl, is that Perl itself does not differentiate between binary and text files. I can read a binary file into a string and manipulate the string value, truncate it, etc. just as I can read a text file into a string.

      The authors say that this feature of Perl is a result of Perl using a full byte (256 possible permutations of the 8 bits) to store each character of a string. ASCII characters only take up 7 bits apiece, so the ASCII character set is incapable of easily representing a binary file. It is hard to say why Larry chose to represent characters as 256 bits apiece, thus allowing strings to contain a binary file.

      I suspect it had more to do with a need to represent Asian languages than it did with any desire to store binary files as strings.