inman has asked for the wisdom of the Perl Monks concerning the following question:

I am working with a large file system (on Windows 2003) that has accumulated a large number of documents in different formats over the years. The file system has been shared amongst team members to store their files. The local custom has been to change the file extension of a document to the initials of the individual. E.g. a word document belonging to John Fredrick Smith would get a JFS ending.

I want to iterate over the file system (using our old friend File::Find) and change the names of the files back to their more conventional extensions (or at least report on them). My problem is that I need to take the individual files and determine the mime-type from their content. Has anybody tried this before? What did you use?

  • Comment on Discvering the MIME type of a document from its content

Replies are listed 'Best First'.
Re: Discvering the MIME type of a document from its content
by davorg (Chancellor) on Nov 18, 2005 at 12:28 UTC

    How about File::Type?

    The docs says this:

    File::MMagic and File::MimeInfo perform the same job, but have a number of problems that led to the creation of this module.

    File::MMagic inlines a copy of the magic database, and uses a DATA filehandle, which causes problems when running under mod_perl.

    File::MimeInfo uses an external magic file, and relies on file extensions rather than magic to determine the mime type.

    As a result, File::Type uses a seperate module and script to create the code at the core of this module, which means that there is no need to include a magic database at all, and that it is safe to run under mod_perl.

    File::Type::Builder, which generates the code at the heart of this module.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Discvering the MIME type of a document from its content
by Corion (Patriarch) on Nov 18, 2005 at 12:30 UTC

    File::MMagic and File::Mimeinfo are two modules that can try to find out the mime type, and from the mime type you can infer the default extension. Let me recommend to you that you still keep the old extensions and only append the new extensions to keep the old "ownership" data around. This will preserve the information available.