How best to identify & Categorize Source Code?

Zadeh has asked for the wisdom of the Perl Monks concerning the following question:

As an ad-hoc method, it's been common in some perl apps I use to maintain an ever-larger list of extensions (.c, .cpp, .h, .pl, ...) to recognize if a file is source code. I can think of a number of problems with this approach:

1) All files have to have an extension. It's not uncommon for people to save scripts and makefiles without one.
2) There's an implicit assumption that there is a one-to-one mapping between each unique extension and the kind of content it should have.
3) You have to continually maintain a list of these extensions.

There's got to be a better way. From within *nix I might often do something like this

$ file -s some_file.c
[download]

and then see:

some_file.c: ASCII C program text
[download]

This brings me to some more questions: Is there a nice tidy perl module to accomplish this effect? If not, how best to implement it?

Comment on How best to identify & Categorize Source Code? Select or Download Code

Replies are listed 'Best First'.
Re: How best to identify & Categorizing Source Code? by perrin (Chancellor) on Mar 31, 2008 at 22:50 UTC
File::MMagic	[reply]
Re^2: How best to identify & Categorizing Source Code? by Zadeh (Beadle) on Mar 31, 2008 at 23:20 UTC
I made a go at this initially, but the only thing it returns so far is "text/plain" which doesn't help much. What am I missing?	[reply]
Re^3: How best to identify & Categorizing Source Code? by perrin (Chancellor) on Apr 01, 2008 at 01:49 UTC
It should be using a technique very similar to the "file" command. Try feeding it your /etc/magic file.	[reply]
Re^4: How best to identify & Categorizing Source Code? by Anonymous Monk on Apr 01, 2008 at 16:22 UTC
Re^5: How best to identify & Categorizing Source Code? by perrin (Chancellor) on Apr 01, 2008 at 16:58 UTC
Re: How best to identify & Categorize Source Code? by apl (Monsignor) on Apr 01, 2008 at 09:55 UTC
If you're on *nix, you could read the first line to see what compiler/interpreter/shell is invoked....	[reply]
Re: How best to identify & Categorize Source Code? by Arunbear (Prior) on Apr 01, 2008 at 18:40 UTC
There is File::Comments, though it is alpha software according to its docs. Alternatively there is File for Windows which may be useful if you're on win32.	[reply]
Re: How best to identify & Categorize Source Code? by Errto (Vicar) on Apr 01, 2008 at 19:36 UTC
Try File::Type. I've used it only a bit, but it at least claims to fix some of the problems with File::MMagic.	[reply]