saurabh.hirani has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys,

I am writing a sub to determine the file type of a file like the unix command file does (by using magic files). There are 3 major CPAN modules available for doing this:

  1. File::Type
  2. File::MimeInfo::Magic
  3. File::MMagic

File::Type as it says was made to overcome the limitation of File::MimeInfo but it fails to check even basic files. For e.g. it identifies a text files as application/octet-stream.

File::MMagic is based on some apache software to do its job. I found it better than File::Type as in it does not screw up identifying text files. But for video files, it identifies them as application/octet stream, Identifies mp3 files as application/octet-stream (File::Type identifies mp3 as audio/mp3)

File::MimeInfo::Magic is a derivative of File::MimeInfo to use magic for determining file types. If magic fails we have the option of falling back on the way MimeInfo checks (using freedesktop database). I found it better than the other two as in it passed the simple text file checks, identifies videos (checked for .flv and .m2v) and audio as audio/mpeg

Here is a comparison of their outputs:

File::Type video1.flv - application/octet-stream File::MimeInfo::Magic video1.flv - application/x-flash-video File::MMagic video1.flv - application/octet-stream File::Type fdcheck - application/x-executable-file File::MimeInfo::Magic fdcheck - application/x-executable File::MMagic fdcheck - application/octet-stream File::Type filetypes.pl - application/x-perl File::MimeInfo::Magic filetypes.pl - application/x-perl File::MMagic filetypes.pl - x-system/x-unix; executable /usr/bin/perl + script text File::Type latex-setup.tgz - application/x-gzip File::MimeInfo::Magic latex-setup.tgz - application/x-gzip File::MMagic latex-setup.tgz - application/x-gzip File::Type mail3.eml - message/rfc822 File::MimeInfo::Magic mail3.eml - message/rfc822 File::MMagic mail3.eml - message/rfc822 File::Type video.m2v - application/octet-stream File::MimeInfo::Magic video.m2v - video/x-msvideo File::MMagic video.m2v - application/octet-stream File::Type utf - application/octet-stream File::MimeInfo::Magic utf - text/plain File::MMagic utf - text/plain File::Type audio.mp3 - audio/mp3 File::MimeInfo::Magic audio.mp3 - audio/mpeg File::MMagic audio.mp3 - application/octet-stream File::Type file.txt - application/octet-stream File::MimeInfo::Magic file.txt - text/plain File::MMagic file.txt - text/plain

The problem is that - I don't know which of them confirms to the MIME types used in email messages. For e.g File::MIMEInfo (and not File::MIMEInfo::Magic) identifies tgz files as application/compressed-tar while my email message has its content type as application/x-gzip.

Which of these modules should one use when creating MIME messages? - i.e. when I build an entity using MIME::Entity "attach" method, I have to give the "Type" which I want to determine using one of these.

Replies are listed 'Best First'.
Re: Comparing different MIME type checking modules
by DStaal (Chaplain) on May 21, 2009 at 16:49 UTC

    They all look they are generating 'legal' MIME information. (Although I'm willing to be proved wrong on that.) The question is which produces the best MIME-type for your files. From what you've shown, File::MIMEInfo::Magic seems to do a fairly good job on a wide variety of files, although it is possible one of the others does a better job on the files you most commonly send.

    As for matching what your email client generates: It's got it's own system for doing this, and it may or may not generate a better MIME type for any specific file.

    Basically, there may be several different MIME types that could be used for a specific file, depending on the level of specificity available to the software on either end. In general more specific is better, but I don't think it is an error to list a more generic type as long as it still correctly describes the file.

      I agree that every email client has its own system for adding the "Type" header but what shook me was the fact that if I were to write a perl program which scans through an email and says whether it contains a .gz file or not, I would rely on the 'Content-type' header and I would probably search for 'application/x-gzip'.

      Now, if the sender has created the mail using File::MimeInfo then my program would say there is no .gz file in the mail as it could not find 'application/x-gzip' because the 'Content-type' header was 'application/x-compressed-tar'. My best bet in this case would be to know the different 'Content-type' headers for .gz file and match against each of them i.e. match against application /x-gzip, if that fails match against applicatio/x-compressed-tar, If that fails maybe assume its not a .gz file.