comment on

I put this in the code archives as Directory Hashing Algorithm. Reprinted here for convenience.

The problem with directory "hashing" is you have to do it right or you're not solving any problems. Using substr() or XOR-math seems to create a pseudo-random distribution, but similar strings often end up getting filed together. That is, if someone uploads picture0001.jpg through picture9999.jpg, your "hash" might not distribute them very well, and performance will suffer.

Since the RC5-hash is a fine piece of work, and I'm not in any position to achieve better, I used the "secret" RC5 mode of the crypt function. Despite the fact that I'm using the same "salt" every time, the output is quite random, and a change as small as one bit can create big waves, as any good hash should.

QMail uses a similar technique to store e-mail messages, as you can imagine that there can be thousands of these on any given server, and that access time is paramount.

#!/usr/bin/perl -w

use strict;

# hashed - Create a hashed directory path for a given filename
#          using an RC5-hash generated by the crypt() function.

sub hashed
{
        my ($name) = @_;

        # Send the input into the grinder until it comes out the
        # right size.  It shrinks blocks of up to 12 characters
        # into only two with each pass.

        while (length($name) > 4)
        {
                my $crypt;

                foreach ($name =~ /.{1,12}/gs)
                {
                        $crypt .= substr(crypt($_,'$1$ABCDEFGH'),12,2)
+;
                }

                $name = $crypt;
        }

        # Fix unruly characters, as crypt will gleefully return
        # '/' in the hashed strings. These are converted to 'Z'
        $name =~ y!/!Z!;

        # Split the returned string into a full path, including
        # the specified filename.
        return join ('/', ($name =~ /../g), $_[0]);
}

print hashed ("thisimage.gif"),"\n";
print hashed ("thisimage.jpg"),"\n";
print hashed ("thisimage1.gif"),"\n";
print hashed ("thisimage2.gif"),"\n";
[download]

The output, for the curious, is:

0u/uh/thisimage.gif
VO/Bz/thisimage.jpg
68/g1/thisimage1.gif
dT/g1/thisimage2.gif
[download]

If you over-hash, you end up with far too many directories and not enough files. Feel free to tweak this to make it better fit your application.

In reply to Re: Storing uploaded files for speed and efficiency by tadman
in thread Storing uploaded files for speed and efficiency by legLess

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.