longjohnsilver has asked for the wisdom of the Perl Monks concerning the following question:

Hi All Monks, I'm a Perl newbie who needs to be enlightened. I've been trying to put up some Perl code which tries to perform the following steps on a linux filesystem which contains zipped logfiles in this format:
logfile_22102008_1051.txt.gz
1.Check if the filesystem is filling up for over 10G
2.If the filesystem size is > 10G, delete the "oldest-day" ones.
3.If after this step the filesystem is still over 10G, delete the "oldest day" ones again and so on until we get under the 10G threshold.
In this last case just exit.
Here's my code snippet, please note that i didn't follow any perl books until now, i just tried to do my best with the notions i have collected on the web until now:
--
my $LOGSIZE = `du -s $LOGSTR | /bin/awk '{print \$1}'`; chomp( $LOGSTR + ); my $I = 5; my $S; # Up to 10G while ( $LOGSIZE > 10000000 && $I >= -1 ) { if ( $I <= -1 ) { $S = ""; } else { $S = "+"; } my $LOGSIZE = `du -s $LOGSTR | /bin/awk '{print \$1}'`; chomp( + $LOGSTR ); `/usr/bin/find $LOGSTR -name "*.gz" -print -mtime $S$I -exec r +m \{\} \\\;`; sleep 2; $I--; } ---
The sizes of the files are the same but the number of them may vary a lot. At this point my question is, since my code doesn't seem to work (it deletes all my .gz files) , how can i delete the "oldest day" files during every step of the filesystem size check? Is this the best solution? I have to keep the filesystem almost full to retain as much info as possible, but i need necessarily to delete the oldest ones because information arrives continously. I hope my explanation is clear enough; Thx Francesco

Replies are listed 'Best First'.
Re: Deleting Oldest Logfiles
by moritz (Cardinal) on Oct 22, 2008 at 15:48 UTC
    First I'd use df to get the used space of a partition (du recursively walks the directory structure, which is rather expensive).
    use strict; use warnings; sub partition_usage { my $partition = shift; my $line = (`/bin/df -B 1024 $partition`)[-1]; return 1024 * (split(/\s+/, $line))[2]; } # test it: print partition_usage('/home/'), "\n";

    Then you should get a list of the log files you might want to delete. Then sort them by modification time. This example assumes that they are all in the current directory:

    my @files = glob 'logfile_*.log.gz'; @files = reverse sort { -M $a <=> -M $b } @files.

    The the documentation of -M and sort for more details.

    Then it's just a matter of walking through these files, and delete until you no longer exceed your size limit:

    my $limit = 10 * 1024**3; # 10 GB for my $filename (@files) { last if partition_usage('/var/log/') < $limit; unlink $filename or warn "Can't delete file '$filename': $!"; }
      Hi Monks,

      Firstly thanks a lot for all your help, and shame on me for that error on the first chomp snippet. Moritz thanks to you especially cause the code you sent seems very elegant and functional to solve my problem. I decided to put it in my script, anyway since i'm trying to learn Perl... There are 2 obscure points in your code which i still didn't grasp completely and i'd be glad if you or somebody else in this monastery could explain them to me.

      1. The -1 and the 2 within square brackets inside the partition_usage sub. What is their exact purpose?
      2. If i had to glob for zipped logfiles within subdirectories how could i do it?

      Thx Again,

      Francesco
        1. The -1 and the 2 within square brackets inside the partition_usage sub. What is their exact purpose?

        The construct before these square brackets return lists. [2] picks the third item of that list (remember, indexes are zero-based) (here: the third whitespace delimited field of the input), and [-1] returns the last list item.

        2. If i had to glob for zipped logfiles within subdirectories how could i do it?

        If you have fixed depth, say 2, you could write */*/*.log.gz. If not you'd have to use File::Find (or File::Find::Rule, which beginners seem to like better).

Re: Deleting Oldest Logfiles
by Illuminatus (Curate) on Oct 22, 2008 at 15:32 UTC
    Also, you use my $LOGSIZE inside of your loop. This temporary variable goes away in each loop iteration, leaving you with the same outer value on each iteration.

    It might be a little more efficient to do something like:

    open LIST, "ls -1t $LOGSTR |" or die "could not ls: $!"; my @Files = <LIST>; close LIST:
    Then you have the list to delete in sorted order, and you can unlink them as needed. It should be more efficient

      Don't forget to chomp the incoming list. Also, if all you do is read the results immediately and use them directly, backticks are simpler.

      chomp( my @Files = `ls -1t $LOGSTR` ); die "No files?" if ! @Files;

      I agree with the idea of using ls to get a sorted list of files. It's a lot easier than a manual stat and sort (though there's probably a module to do it for you).

Re: Deleting Oldest Logfiles
by kyle (Abbot) on Oct 22, 2008 at 15:57 UTC

    What you want to do, I might do this way (with thanks to Illuminatus for the suggested use of "ls -1t"):

    chomp( my @Files = `ls -1t $LOGSTR` ); die "No files?" if ! @Files; # Up to 10G while ( logsize() > 10_000_000 ) { my $doomed = pop @Files; warn "DELETE $doomed\n"; unlink "$LOGSTR/$doomed" or warn "Can't unlink '$LOGSTR/$doomed': $!"; sleep 2; } sub logsize { my $LOGSIZE = `du -s $LOGSTR`; $LOGSIZE =~ s{ \A # start of string ( \d+ ) # any number of digits [$1] \s # white space .* # anything \z # end of string }{$1}xms; return $LOGSIZE; }

    I haven't tested this, however, so use with caution.

Re: Deleting Oldest Logfiles
by toolic (Bishop) on Oct 22, 2008 at 15:25 UTC