in reply to Storing uploaded files for speed and efficiency

I put this in the code archives as Directory Hashing Algorithm. Reprinted here for convenience.

The problem with directory "hashing" is you have to do it right or you're not solving any problems. Using substr() or XOR-math seems to create a pseudo-random distribution, but similar strings often end up getting filed together. That is, if someone uploads picture0001.jpg through picture9999.jpg, your "hash" might not distribute them very well, and performance will suffer.

Since the RC5-hash is a fine piece of work, and I'm not in any position to achieve better, I used the "secret" RC5 mode of the crypt function. Despite the fact that I'm using the same "salt" every time, the output is quite random, and a change as small as one bit can create big waves, as any good hash should.

QMail uses a similar technique to store e-mail messages, as you can imagine that there can be thousands of these on any given server, and that access time is paramount.
#!/usr/bin/perl -w use strict; # hashed - Create a hashed directory path for a given filename # using an RC5-hash generated by the crypt() function. sub hashed { my ($name) = @_; # Send the input into the grinder until it comes out the # right size. It shrinks blocks of up to 12 characters # into only two with each pass. while (length($name) > 4) { my $crypt; foreach ($name =~ /.{1,12}/gs) { $crypt .= substr(crypt($_,'$1$ABCDEFGH'),12,2) +; } $name = $crypt; } # Fix unruly characters, as crypt will gleefully return # '/' in the hashed strings. These are converted to 'Z' $name =~ y!/!Z!; # Split the returned string into a full path, including # the specified filename. return join ('/', ($name =~ /../g), $_[0]); } print hashed ("thisimage.gif"),"\n"; print hashed ("thisimage.jpg"),"\n"; print hashed ("thisimage1.gif"),"\n"; print hashed ("thisimage2.gif"),"\n";
The output, for the curious, is:
0u/uh/thisimage.gif VO/Bz/thisimage.jpg 68/g1/thisimage1.gif dT/g1/thisimage2.gif
If you over-hash, you end up with far too many directories and not enough files. Feel free to tweak this to make it better fit your application.