orthanc has asked for the wisdom of the Perl Monks concerning the following question:

hey ho ppl

This question is really about deleting a file from a directory containing a few thousand files.

Lets begin... I need to unlink 1 file but the only data I have is the id which starts the filename. Thus I need to use filename globbing, which on a few thousand files takes some time.

opendir(DIR,"/path") || die("No opendir on $path\n"); my @files = grep { /^${id}\_.*$/ } readdir DIR; closedir(DIR); my $filename = "/path/$files[0]"; unlink("$filename") || die("No unlink on $filename\n");

Please excuse the shoddy code and lack of error checking, but its only an example.

Rather than having to grep the whole dir everytime I want to delete a file, is there a simpler and more importantly faster way that I'm for some reason blind to?

Thanks for any help
Orthanc

Replies are listed 'Best First'.
Re: removing a needle from a haystack
by jeroenes (Priest) on Feb 14, 2001 at 20:37 UTC
    You don't need to do a readdir first. You can leave that up to perl, by using:
    while( $file = <$id.'*'> ){ do_something( $file ); }

    The <> takes care of the filename expansion.

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)

    Update:You can read about it in glob and perlop2. The latter makes me realize that you'll be better of with

    while( $file = glob "$id*" ){ do_something( $file ); }
Re: removing a needle from a haystack
by McD (Chaplain) on Feb 14, 2001 at 20:49 UTC
    orthanc writes:

    Thus I need to use filename globbing, which on a few thousand files takes some time.

    This may or may not be an option to you, but consider making your directory structure heirarchical, instead of flat.

    Flat directories over 1K files tend to get unreasonably slow, at least in my Linux/Solaris experience.

    Incidentally, anybody ever run performance numbers comparing readdir to glob? Is either faster?

    Peace,
    -McD

Re: removing a needle from a haystack
by TheoPetersen (Priest) on Feb 14, 2001 at 21:29 UTC
    grep is not the best choice for finding a single match in a large array; unfortunately Perl doesn't offer a function that is as simple as grep for this, so people use it anyway.

    Depending on the data, a for loop that bails out at the first match is usually faster. Since in this case we want to avoid the readdir calls though, a while is best:

    opendir(DIR, $path) or die("No opendir on $path\n"); while (defined($_ = readdir DIR)) { if (/^${id}\_.*$/o) { $filename = "$path/$_"; last; } } closedir(DIR); if ($filename) { unlink($filename) or die("No unlink on $filename\n"); }
Re: removing a needle from a haystack
by Pahrohfit (Sexton) on Feb 14, 2001 at 22:37 UTC
    Unless I'm missing something your getting at, all you need to do would be:
    unlink </path/${id}\_\.*>;
Re: removing a needle from a haystack
by binary* (Novice) on Feb 14, 2001 at 22:39 UTC
    Orthanc: You don't really explain the application in enough detail to know if there might be a design change that could be made to provide the full filename. Why do you need to delete only one file? How do you find out which file to delete? How often does this come up? Could a naming convention in the directory help reduce the number of potential targets?
Re: removing a needle from a haystack
by Anonymous Monk on Feb 15, 2001 at 03:07 UTC
    If it's just one file wouldn't it make sense to use ls instead of grep? as in @file = split(/\n/,`ls id*`);