sanPerl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Gurus,
My script gets input in two formats
1) MS-Word (i.e. doc) and
2) Simple Text.
I need to process these files separately. I have written a simple module to extract file extension and process the files accordingly. However I am Not happy with this approach because ideally supplier can rename extension to any other name (for e.g. "abcd.doc" can be renamed as "abcd.txt", but it would still be DOC file).
Is there any other way, where I can understand type of input file? I welcome all kinds of suggestions/hints

Regards,
Sandeep

Replies are listed 'Best First'.
Re: Understanding File Type
by chromatic (Archbishop) on Mar 17, 2006 at 08:30 UTC
Re: Understanding File Type
by spiritway (Vicar) on Mar 17, 2006 at 08:42 UTC

    I think you can check the first few bytes/characters of the file, to see whether they're simple text, or .DOC format. It appears that the first few characters in a Word .DOC file are ÐÏࡱá, which do not in general resemble what you find in text files. I don't know how reliable this is, but my guess is that Word docs will have some odd characters in them early on, and that could be a way of identifying the type independent of the extension.

Re: Understanding File Type
by swaroop.m (Monk) on Mar 17, 2006 at 08:41 UTC
    Hi Sandeep, This is a lame solution.You can always use the Win32 OLE object to read both txt and doc. The extenion type does not really matter. Hope this helps. thanks, roop
Re: Understanding File Type
by leocharre (Priest) on Mar 17, 2006 at 20:54 UTC
    use File::MMagic; my $path='/path/to/file.doc'; my $mm= new File::MMagic; #use internal magic file my $t = $mm->checktype_filename("$path"); # $t holds mime type- doesn't matter what the file extension or no ext +ension is
      Dear Monks
      'MMagic' as well as 'file <namefile>' (on Linux) are really good solutions
      Thanks to all for the suggestions

      Regards,
      Sandeep
Re: Understanding File Type
by spadacciniweb (Curate) on Mar 17, 2006 at 08:31 UTC
    I suppose you work on M$ Windows platform.
    Windows work with file extension... I don't
    know an answer, but you can watch this link
    http://gnuwin32.sourceforge.net/packages/file.htm
    However, with GNU/Linux (I suppose in all Unix-like), you can
    write in shell
    file namefile
    and you get information with file, also MS Windows file.