in reply to Perl -T vs Mime::Types

It might be beneficial to take a step back and look at the big picture end goal.

What is it that you are wanting to do that requires checking to see if a file contains plain ASCII text?

Replies are listed 'Best First'.
Re^2: Perl -T vs Mime::Types
by roperl (Beadle) on Sep 20, 2017 at 15:05 UTC
    Program is handling input files from various clients. These files can be sent in via sftp or ftp with files encrypted by gpg. Files can also be zipped or compressed with gz. Once the files are either unzipped, decompressed or decrypted I'm expecting a plain ASCII text file. I want to ensure the file is valid before moving it off to another program to handle
      These files can be sent in via sftp or ftp with files encrypted by gpg. Files can also be zipped or compressed with gz. Once the files are either unzipped, decompressed or decrypted I'm expecting a plain ASCII text file.

      Without knowing all the intricacies and ins and outs of your workflow pipeline, I'd almost say (in my admitted ignorance) that the whole "check for ASCII" test is a bit superfluous.

      Wouldn't the upstream process be responsible for checking if the decryption/decompression was successful, and wouldn't the downstream process be responsible for checking for well-formed data? Is there a particular case you are trying to guard against?

      It seems to me that a plain -T $file should be sufficient to catch a rogue encrypted and/or compressed file that made it past the first process without triggering an error (although what happens if the file is Base64 or otherwise ASCII-armored?)

        It seems to me that a plain -T $file should be sufficient to catch a rogue encrypted and/or compressed file that made it past the first process without triggering an error (although what happens if the file is Base64 or otherwise ASCII-armored?) - Yes that is exactly what I'm trying to guard against: sending the upstream program a file that somehow failed to get decrypted, decompressed or unzipped. The upstream process will do its own error checking is there is malformed content such as non-printable characters it will error out. So including any of that type of checking will be redundant. I'm not expecting any Base64 or ASCII armored input, but if somehow base64 or ASCII armored text is sent it I'll process it as valid ASCII text and the upstream program will reject based on the invalid content