in reply to Whats a good process for archiving cron processed files?

I'm not entirely sure what your current process is, but it seems like you instinctively know there's a better way than what you're currently doing. So, allow me to walk you through the way I've processed and archived various syslog files.

For syslog files, or any constantly being written-to files, under unix, it's not difficult to "rotate" them, you simply need to know the accepted standard practice.

  1. move the current being written to file to a new name
  2. signal syslog that it needs to reopen its log files
  3. process the 'new name' file, and archive it as you wish

Under unix, the processes will continue to write to the original file, even after it's been moved to a new name until they are told to close the current file and reopen the original file name(s). So, the process above guarantees that you won't lose any messages while you're moving the files around.

So, now we have a static (non-changing over time) file to process and archive. In my scripts that do the processing, I include the step to gzip that particular file. I also include, as a command line argument, the directive of whether to actually archive the file or not so I can process the current syslog files if I need to without archiving it. And I save it with a dated name as you've indicated you do.

At the end of the month, I have 30ish files with names like local7.20050701.gz local7.20050702.gz ... local7.20050731.gz. On the first of every month I have another process called "archive-month" that runs, and since I like the whole month's worth of information in one file, I gzcat them all to a local7.200507 file, and then gzip that, but you could unzip them all, then tar and gzip the tar file. For text files especially, you want to tar up the uncompressed files, then compress the tar file, you don't want a tar file with gzipped files inside it, the compression won't be nearly as good.

I'll include my archive-monthly script in a little bit, I need to clean it up :-)

-Scott

Update: gzipped files do have the .gz extention, originally forgot to add those to the names above.

Update2: Code follows:

#!/usr/bin/perl # # archive-month.pl # # Purpose: # Combine all last months log files into one file. # ############## use strict; use warnings; use File::Basename; use Getopt::Std; use vars qw($Ver $Year $Month $opt_d $opt_f $opt_h $opt_v); $Ver = "0.1.2"; my $GZIP = "/usr/bin/gzip -9"; my $GCAT = "/usr/bin/gzcat"; my $GTEST= "/usr/bin/gzip -t"; my $RM = "/usr/bin/rm"; my $CP = "/usr/bin/cp"; my $MV = "/usr/bin/mv"; sub Usage { print <<USAGE; Usage: $0 [-hv] [-d directory] -f base-filename -f <required> base filename for files to archive into monthly fil +e Options: -d directory to cd into; where target files are located -h print this message -v print version which is $Ver USAGE } ############################################### #allow options to be intermixed with filenames ############################################### { my @hold; while(@ARGV) { #if it doesn't start with a '-', assume it's not an cmd li +ne option while($ARGV[0] =~ /^[^\-]/) { push @hold, shift @ARGV; } getopts('d:f:hv'); } @ARGV=@hold; } ############################################### if( $opt_v ) { print " $0 version $Ver\n"; exit 0; } if( $opt_h ) { &Usage; exit 0; } if(! defined $opt_f) { print " Must have basefile name of zipped logs\n"; &Usage; exit 1; } ###################################################################### +# # Beginning of Program ###################################################################### +# my ($date,$time) = SetTime(); print "Monthly archive started at $date $time\n\n"; # # The month is zero based, so this month is $mon+1, last month is +$mon # unless $mon=0, in which case last month is $mon+12 # my $lastmonth; if($Month==0) { $lastmonth = sprintf("%04d%02d",$Year-1,$Month+12); } else { $lastmonth = sprintf("%04d%02d",$Year,$Month); } if($opt_d ne '') { chdir $opt_d or die "Can't cd into $opt_d: <$!>\n"; } if ( -e "$opt_f.$lastmonth.gz" || -e "$opt_f.$lastmonth" ) { die "Archive for $opt_f.$lastmonth already exists!\n"; } { my @junk = glob "$opt_f.$lastmonth??.gz"; if (scalar (@junk) == 0) { print join "", `ls -alF $opt_f.*.gz`; die "No files matching $opt_f.$lastmonth??.gz to archive\n +"; } } dosystem("$GCAT $opt_f.$lastmonth??.gz | $GZIP > $opt_f.$lastmonth +.gz"); if ( -z "$opt_f.$lastmonth.gz" ) { dosystem("$RM $opt_f.$lastmonth.gz"); die "Archive was empty - removed\n"; } if(dosystem("$GTEST $opt_f.$lastmonth.gz") != 0) { dosystem("$RM $opt_f.$lastmonth.gz"); die "Archive was corrupt - removed\n"; } dosystem("$RM $opt_f.$lastmonth??.gz"); if($Month==0) { my $lastyear = sprintf("%04d",$Year-1); mkdir $lastyear, 755; dosystem("$MV $opt_f.$lastyear* $lastyear"); } ###################################################################### ($date,$time) = SetTime(); print "\nMonthly archive finished at $time\n"; exit; sub dosystem { my $rc = system(@_) & 0xFFFF; if ($rc == 0) { return $rc; } if ($rc == 0xff00 ) { print "Command <@_> failed: $!\n"; exit 1; } if ($rc > 0x80) { $rc >>= 8; print "Command <@_> ran with exit code $rc\n"; return $rc; } if ($rc & 0x80) { printf "Core dumped from signal %d\n", $rc & 0x7F; exit 2; } else { printf "Interrupted by signal %d\n", $rc & 0x7F; exit 3; } die "Should never be able to get here, \$rc was <$rc>\n System err +or, if any, was <$!>\n"; } ###################################################################### +# # Set time variables to current time. ###################################################################### +# sub SetTime { my ($sec,$min,$hr,$mday,$mon,$yr,$wday,$yday,$isdst) = localtime(t +ime()); my $date = sprintf("%04d%02d%02d",$yr+1900,$mon+1,$mday); my $time = sprintf("%02d:%02d:%02d",$hr,$min,$sec); #Side effects, I need the year and the month for other stuff, set +the #global variables to the appropriate values $Year = $yr + 1900; $Month = $mon; return ($date, $time); }