in reply to Need directory scheme to store 400,000 images

You'll probably want to md5 the name to more evenly distribute the prefix letters, then use one or two of the hex characters at each level until the final directories are small enough.
use Digest::MD5 qw(md5_hex); my $name = "123456.jpg"; my $path = md5_hex($name); $path =~ s#^(.)(.)(.).*#$1/$2/$3/$name#;
Three hex characters will distribute 400K files into 100 files per directory. Cache::FileCache uses a similar scheme.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Replies are listed 'Best First'.
Re: •Re: Need directory scheme to store 400,000 images
by markjugg (Curate) on Apr 12, 2004 at 21:42 UTC
    We decided to use md5 over the other options. Some reasons included:
    • md5 creates a balanced tree. With the numerical way, directories would vary from having 1 to 10,000 files in them, assuming 3 levels of directories in the form "1/2/3".
    • IDs of less than 3 characters get treated the same way. (With the numerical version, files ended up in higher level directories, or got padded with zeros).
    • It's still not very hard to find a directory "by hand" if a programmer needs to do that. Using md5 on the ID on the command line will quickly reproduce the result. This would work as well:
      find ./uploads_dir -name '1234.jpg'

      I also found out we need to plan for more like 1.5 million images.