in reply to Re: Getting File Type using Regular Expressions
in thread Getting File Type using Regular Expressions
Of course, Windows applications make a lot of assumptions about files based on their filenames (especially their extensions).Yep, that's pretty dumb.
You may want to check file type by investigating the contents, not the filename.That makes some sense. But not overly because it's pretty hard to do.
The first two or four bytes of most files are often a very good clue as to the file's type. These bytes are usually referred as a "magic number."<rant>
The Netpbm project uses several (related) file formats. The magic numbers are "P1", "P2", "P3", "P4", "P5" and "P6". Looks simple. Looks extentable as well, doesn't? If more formats are needed, just continue the numbering. "P7", "P8", "P9", "P10". Right? No. If you start a file with a P followed by a 1, regardless of what follows, file thinks it's a "Netpbm PBM image text". Even if it's a simple text file that starts with the sentence "P100s of Samsung are really cool phones".My point is that magic numbers suck as bad as file extensions. Both magic numbers and file extensions work in practise reasonably well because people follow de facto standards. Windows uses file extensions almost exclusively. Unix (and with that, I mostly mean Unix tools) rely on both. Some tools use magic numbers. Some use tools use file extensions. Some use both.
Abigail
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Getting File Type using Regular Expressions
by halley (Prior) on Apr 21, 2004 at 14:57 UTC | |
by Abigail-II (Bishop) on Apr 21, 2004 at 16:38 UTC | |
by halley (Prior) on Apr 21, 2004 at 17:46 UTC | |
|
Re: Re: Getting File Type using Regular Expressions
by Anonymous Monk on Apr 21, 2004 at 17:12 UTC |