in reply to Designing storage of uploaded files

Using a hash function to avoid namespace collisions won't work if you just hash the filename. If 'H' is your hash function, H(x) == H(y) if x == y. You will still get the same hash value from identical filenames. To mix things up, perhaps hash the filename concated with something non-static like the return value of localtime or something (md5_hex($filename . localtime)). You could also use crypt()'s hash function and add random salt each time a file is uploaded.

Of course, this is all assuming you're intent on using hashes. However, I see no point in doing so. You will not be able to go backwards from the hashed value to the original filename -- only if the user must enter a filename for your script to retreive (so you can hash that and look for the hash value in the DB). And even that will only work if you do not add salt to the filename to prevent collisions. So a hash seems silly to me.

I would highly suggest -- especially for a large-scale project like this -- storing the files in SQL blobs instead of in real files. A malicious user can put whatever characters they want in the filename, but you don't need to worry with an SQL implementation. Using the hash is a noble way to create filenames that are "safe", but like I said, the hash will be one way, and it seems like you want the filename back. Plus, you really have to be on your guard when you let CGI scripts write to files and especially create new files.

May I suggest altering your table so that the sample_id field is AUTO_INCREMENT -- let SQL take care of the primary key for you. This way, you can have multiple files with identical names, just refer to them always by their unique id (myscript.pl?file=42). You wouldn't have to try to avoid namespace collisions (unless there were other reasons for doing so). If you really need a directory structure to these files, create a column sample_is_folder (boolean) and a sample_parent column so you can set up a tree-ish structure. Well, there would obviously be more to it than that, but hopefully you get the idea. .... Oh yes, and of course add the BLOB/LARGEBLOB column for the uploaded files if you choose that route.

Good luck!