Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi, if I had a cron script delete files every 4 hrs, 6x a day, would i have to be concerned with the script timing out if it was to delete 100,000-200,000 10kb size files? The files will never exceed 10 kb's.

Also is there any thing I need to worry about when unlinking that many files at once? Should I set a timeout and perhaps delete 25,000 a time?

Replies are listed 'Best First'.
Re: unlinking performance
by Zaxo (Archbishop) on Jun 10, 2007 at 06:11 UTC

    The file size shouldn't affect performance very much. The real answer will depend on what file system you use and how scattered over the disk your file list may be. A very long file list may trigger swap if you are short on memory.

    I don't think that 100-200k unlink's will take more than four hours so long as you use unlink instead of calling system rm. Runtime will be faster for journaled file systems, but I can't swear to the speed of the background operations.

    In short, we don't know enough about your code or your system to give better than general advice.

    After Compline,
    Zaxo

Re: unlinking performance
by blazar (Canon) on Jun 10, 2007 at 09:27 UTC
    hi, if I had a cron script delete files every 4 hrs, 6x a day, would i have to be concerned with the script timing out if it was to delete 100,000-200,000 10kb size files? The files will never exceed 10 kb's.

    The file sizes should be irrelevant in most filesystems that I know of. But timed out by what? Do you mean overlapping with the next run? I doubt so. Testing on XP with ntfs, which by far I doubt to be the most efficient situation:

    C:\temp>mkdir test C:\temp>cd test C:\temp\test>perl -e "for (1..100_000) { open my $f, '>', $_ or die $! + }" C:\temp\test>perl -le "$n=time; unlink 1..100_000; print time-$n" 55
      Just for interest, on Debian Lenny/Testing usig ext3...

      zippy:~/scripts/tmp$ uname -a Linux zippy 2.6.18-4-amd64 #1 SMP Mon Mar 26 11:36:53 CEST 2007 x86_64 + GNU/Linux zippy:~/scripts/tmp$ time perl -e 'for (1..100_000) { open my $f, ">", + $_ or die $! }' real 9m21.670s user 0m2.480s sys 9m8.294s zippy:~/scripts/tmp$ time perl -le 'unlink 1..100_000' real 0m1.819s user 0m0.064s sys 0m1.428s
Re: unlinking performance
by leot (Sexton) on Jun 10, 2007 at 11:27 UTC
    I try it with a script like the blazar one-liner:
    1. Create 100000 (empty) files (crfile.pl):
    #!/usr/bin/perl use warnings; use strict; for (1..100000) { open my $file, '>', $_ or die $!; }
    2. Remove all 100000 files with Perl (rmfile.pl):
    #!/usr/bin/perl use warnings; use strict; my $tempo = time; unlink 1..100000; print time - $tempo . "\n";
    Now, I create and remove all 100000 file with Perl:
    leonardo@bianconiglio:~/testfile$ ./crfile.pl leonardo@bianconiglio:~/testfile$ ./rmfile.pl 12
    I'm under a Debian GNU/Linux system and the FS is ReiserFS.
    --leot
Re: unlinking performance
by Anonymous Monk on Jun 11, 2007 at 08:11 UTC
    I don't know if it's applicable here, but another solution would be to create a big file (around 5G in your case, ~524288 * 10 * 1024, which is +/- twice the maximum space you suggest), format it and mount it. Do your things with the small files in it, and once finished, you just need to delete 1 file... Or to quick format it.