removing a needle from a haystack

orthanc has asked for the wisdom of the Perl Monks concerning the following question:

hey ho ppl

This question is really about deleting a file from a directory containing a few thousand files.

Lets begin... I need to unlink 1 file but the only data I have is the id which starts the filename. Thus I need to use filename globbing, which on a few thousand files takes some time.

opendir(DIR,"/path") || die("No opendir on $path\n");
my @files = grep { /^${id}\_.*$/ } readdir DIR;
closedir(DIR);

my $filename = "/path/$files[0]";
unlink("$filename") || die("No unlink on $filename\n");
[download]

Please excuse the shoddy code and lack of error checking, but its only an example.

Rather than having to grep the whole dir everytime I want to delete a file, is there a simpler and more importantly faster way that I'm for some reason blind to?

Thanks for any help
Orthanc

Comment on removing a needle from a haystack Download Code

Replies are listed 'Best First'.
Re: removing a needle from a haystack by jeroenes (Priest) on Feb 14, 2001 at 20:37 UTC
You don't need to do a readdir first. You can leave that up to perl, by using: `while( $file = <$id.''> ){ do_something( $file ); }` [download] The <> takes care of the filename expansion. Hope this helps, Jeroen "We are not alone"(FZ)* Update:You can read about it in glob and perlop2. The latter makes me realize that you'll be better of with `while( $file = glob "$id*" ){ do_something( $file ); }` [download]	[reply] [d/l] [select]
Re: removing a needle from a haystack by McD (Chaplain) on Feb 14, 2001 at 20:49 UTC
orthanc writes: Thus I need to use filename globbing, which on a few thousand files takes some time. This may or may not be an option to you, but consider making your directory structure heirarchical, instead of flat. Flat directories over 1K files tend to get unreasonably slow, at least in my Linux/Solaris experience. Incidentally, anybody ever run performance numbers comparing `readdir` to `glob`? Is either faster? Peace, -McD	[reply] [d/l] [select]
Re: removing a needle from a haystack by TheoPetersen (Priest) on Feb 14, 2001 at 21:29 UTC
grep is not the best choice for finding a single match in a large array; unfortunately Perl doesn't offer a function that is as simple as grep for this, so people use it anyway. Depending on the data, a `for` loop that bails out at the first match is usually faster. Since in this case we want to avoid the readdir calls though, a `while` is best: `opendir(DIR, $path) or die("No opendir on $path\n"); while (defined($_ = readdir DIR)) { if (/^${id}\_.*$/o) { $filename = "$path/$_"; last; } } closedir(DIR); if ($filename) { unlink($filename) or die("No unlink on $filename\n"); }` [download]	[reply] [d/l] [select]
Re: removing a needle from a haystack by Pahrohfit (Sexton) on Feb 14, 2001 at 22:37 UTC
Unless I'm missing something your getting at, all you need to do would be: `unlink </path/${id}\_\.*>;` [download]	[reply] [d/l]
Re: removing a needle from a haystack by binary* (Novice) on Feb 14, 2001 at 22:39 UTC
Orthanc: You don't really explain the application in enough detail to know if there might be a design change that could be made to provide the full filename. Why do you need to delete only one file? How do you find out which file to delete? How often does this come up? Could a naming convention in the directory help reduce the number of potential targets?	[reply]
Re: removing a needle from a haystack by Anonymous Monk on Feb 15, 2001 at 03:07 UTC
If it's just one file wouldn't it make sense to use ls instead of grep? as in @file = split(/\n/,`ls id*`);	[reply]