Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks

A simple one here - I have a small script which spell checks and corrects other files where necessary (amongst other things). I therefore need the functional equivalent of `file` and am only interested in "text" or "ASCII" files.

From your experience, is the built in -T test good enough - if not, what would be a good alternative (I'm trying to avoid spawning external progs)? The target platform is unices of varying flavours.

Thanks

  • Comment on How reliable is -T as a test for ASCII files?

Replies are listed 'Best First'.
Re: How reliable is -T as a test for ASCII files?
by calin (Deacon) on May 12, 2004 at 17:40 UTC

    From perldoc -f -X :

    The "-T" and "-B" switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or characters with the high bit set. If too many strange characters (>30%) are found, it's a "-B" file, otherwise it's a "-T" file. Also, any file containing null in the first block is considered a binary file. If "-T" or "-B" is used on a filehandle, the current stdio buffer is examined rather than the first block. Both "-T" and "-B" return true on a null file, or a file at EOF when testing a filehandle. Because you have to read a file to do the "-T" test, on most occasions you want to use a "-f" against the file first, as in "next unless -f $file && -T $file".

      "next unless -f $file && -T $file"

      That's a little bit tautologous. If it's not a file, it's never going to be a text file :) And if it is a text file, my guess is that it's also a file :)

      The only question with file testing (when not running as root) is whether you have permission to test the file. We run some file tests through sudo on web servers when testing user's files - otherwise you get unexpected results when testing files with permissions 600 / 700 etx.

      cLive ;-)

Re: How reliable is -T as a test for ASCII files?
by zentara (Cardinal) on May 13, 2004 at 14:09 UTC
    The problem with -T is it's not very "intelligent". It will find a postscript graphics file, and call it text.

    So you might want to do some intelligent filtering with a modules like MIME::Types.

    use MIME::Types; my $mimetypes = MIME::Types->new; my MIME::Type $plaintext = $mimetypes->type('text/plain'); print $plaintext->mediaType; # text print $plaintext->subType; # plain
    You could also do some extension testing depending on the type of files you have on your system.

    I'm not really a human, but I play one on earth. flash japh
      That's exactly the kind of answer I was hoping for!

      Thanks very much for your help.