Re: Recursion and Such

Just a suggestion, but if your intent is to spread out files across a directory tree to improve access time (e.g. you've got an older Linux using ext2 that drags on directories with over ~1000 entries) or have an NFS server that doesn't allow over n bytes worth of filenames in an inode, you might make a tree of [[:hexdigit:]]+/[[:hexdigit:]]+/realfile, where the [[:hexdigit:]]+ chunks are derived using Digest::MD5 or the like from the realfile's name.

(Just to toss that idea out as I've had to deal with both the problems I mentioned in the past . . .)

Update: The OP contacted me out-of-band with questions about how exactly to apply this to their situation. I'll answer them here just in case anyone else was interested . . .

The way I've always used this in the past is to basically use the "filename" as a key that gets run through MD5 to create the real on-disk filename. If you can regenerate the key easily (say it's a log file of the form "username-month-year") you don't really need to keep the original filenames around; otherwise you'll want to keep a list of just keys (possibly using DB_File, a real RDBMS, or a flat file) to use as a table of contents. Whichever way you go, what you want to write is a key2path( ) routine which you pass in the key and get back the real on disk path. This example does two levels deep (so you need code to make directories 00 .. ff and then 00/00..ff, 01/00..ff, ...) and uses the hashed key as the pathname (although there's no reason you couldn't use $key instead of $digest; the reason I didn't in the most recent case I used this was that it was the length of the original filename causing problems to begin with and the key was readily available externally):

sub key2path {
   my $key = shift;
   my $digest = Digest::MD5::md5_hex( $key );

   return substr( $digest, 0, 2 ) . "/" . substr( $digest, 2, 2 ) . "/
+" . $digest;
}
[download]

Comment on Re: Recursion and Such Select or Download Code

Replies are listed 'Best First'.
Re^2: Recursion and Such by Grundle (Scribe) on Jan 31, 2005 at 19:50 UTC
I never even thought of that. Great suggestion! BTW - you hit the nail on the head about the problem. NFS isn't the best when you start dealing with large file sets under one directory, so of course the solution is to spread the files out and preserve speed.	[reply]
Re^2: Recursion and Such by Grundle (Scribe) on Feb 01, 2005 at 18:20 UTC
In reply to the Update This seems to be a much simpler approach. It definitely saves on a lot of code, and takes out the confusing recursion. Thanks for offering a viable alternative that was in effect much easier to implement.	[reply]