takes an apache log, and splits it up into a number of logfiles. One for each day traffic took place on. (eventually going to rework it to use date::manip and output the week number as well.
#!/usr/bin/perl -w use strict; ## Just a small file to split apache logs up into days (Should work fo +r any log that has its date and time in the same format as apache [dd +/mm/yy:rest of st amp] ## Nothing particualrly fancy. ## Date extraction from apache log files. combined format. sub get_date_from_log_line{ my %date; my $line = shift; my $dateline=$1 if ($line=~ m/(\[.+?\])/); my @datestring=split(/:/,$dateline); substr($datestring[0],0,1)=""; return $datestring[0]; } ## Basic Variable setups. ## my $pathname=shift(@ARGV) or die("Two arguments please: log file to be + split, and where to put the split files"); my $final_directory=shift(@ARGV) or die("Two arguments please: log fil +e to be split, and where to put the split files"); my $date; my $date_last; my $line; ## /Variables ## open(FILE1,"$pathname") or die("bugger $pathname\n"); $line=<FILE1>; $date=&get_date_from_log_line($line); my $timeStamp=$date; $timeStamp =~ s/\///g; my $outputfile="$final_directory$timeStamp.log"; open (OUTFILE,">$outputfile"); print OUTFILE $line; until (eof(FILE1)) { $line=<FILE1>; $date_last=$date; $date=&get_date_from_log_line($line); if ($date_last ne $date){ close (OUTFILE); $timeStamp=$date; $timeStamp =~ s/\///g; $outputfile="$final_directory$timeStamp.full.log"; open (OUTFILE,">$outputfile") or die("damn it to hell $!\n$outputfile\ +n"); } print OUTFILE $line; } close(OUTFILE); print "Files Split.\n";

Update- revised and removed the .*

Replies are listed 'Best First'.
Re: apache log splitter (bug)
by humanclock (Initiate) on Oct 03, 2009 at 06:15 UTC

    This code assumes that the logfile is in ascending order, which is not always the case at midnight on higher traffic websites. A line or two with the previous day's timestamp can still show up in the logfile during the first minute of the new day.

    Hence, since this script creates a new logfile rather than appending to an existing one....thus one data line out of order in the logfile will destroy what was already written for that entire day.