You can use the tell() function to get the current file offset, and seek() function to get back to that offset, once you've reopened the file.
use strict;
## obtain $LastPosition
open(LOG, '/var/log/messages');
seek(LOG, $LastPosition, 0);
while (my $line = <LOG>) {
chomp $line;
## Process $line;
}
$LastPosition = tell(LOG);
close(LOG);
## store $LastPosition
| [reply] [d/l] [select] |
# Close the log file and save information for next log check.
# Remember where (and when) we stopped reading for next time.
$FTPTrkLastReadPos = tell(FTPLOG);
$FTPTrkLastReadTime = time;
close(FTPLOG);
and
# Get current characteristics of the log file
my ($filesize,$modified,$created) = (stat(_))[7,9,10];
$FTPTrkLastReadPos = 0 if( $filesize < $FTPTrkLastReadPos );
# Process any and all new entries
seek FTPLOG, $FTPTrkLastReadPos, 0;
Why do I check the filesize of the file before reading it? Because someone may have deleted, reduced or truncated the file since I last read it. In my case I am able to handle this gracefully because I can look at the timestamps in each log entry and see if I have processed them before.
| [reply] [d/l] [select] |
my $dhcplog = '/var/log/dhcpd';
my $lastfile = '/var/run/statdhcpcron.last';
my $logfile;
open $logfile, '<', $dhcplog;
sub setlast {
my $last = shift;
my $f;
open $f, '>', $lastfile or return;
print $f $last,$/; # last is first line
report($f); # report is the rest of f
+ile
close $f;
}
sub getlast {
local @ARGV = $lastfile;
my $l = <>; # last is first line
$l ||= 0; # or start from beginning
$l = 0 if $l > -s $logfile; # handle rotate/truncate
return $l;
}
my $last = getlast();
seek $logfile, $last, 0;
while (<$logfile>) {
# process lines
}
setlast(tell $logfile); # remember where we stopped
close $logfile;
# mail report if needed
exit;
sub report {
# returns report based on lines processed
}
don't forget to handle cases such as 'first time run' and 'file mysteriously shrunk'. this script keeps the last byte processed (and a copy of the report) in a file in /var/run.
| [reply] [d/l] |
I think a simple tell -> seek algorithm and by checking file size == 0 are not enough. Your log file might have been truncated, re-written to, and then the new file size grows bigger than before. If you seek to previous position, then the script will work, but now it has an undetected logic error. You will need to record the position + the last line you have seen, then you can make sure that is the line where you have last visited.
This will work if the log is not a rotating log. What if it is? Uh... my head hurts.
The following is my attempt during my lunch break - (Ok, I have deliberately chose not to use Pod::Usage)
use strict;
use Getopt::Long;
use IO::File;
use Data::Dumper;
GetOptions
(
'i|input=s' => \( my $INPUT = "./access.log" ),
'l|lastpos=s' => \( my $LASTPOS = "./lastpos.txt" ),
'f|feedback' => \( my $FEEDBACK = undef ),
);
unless ( defined $INPUT && defined $LASTPOS )
{
print <<EOF;
Logfile Parser - Parse input log efficiently
Usage: $0 [option]
Options:
-i|--input [filename] Specify the input log file name.
-l|--lastpos [filename] Specify the name of last pos file.
-f|--feedback Let the program print progress prompt.
EOF
exit(1);
}
# load the last pos information
my $lastinfo;
$lastinfo = ReadLastPosFile($LASTPOS) if -f $LASTPOS;
print "Last position:\n", Dumper($lastinfo) if ($FEEDBACK);
# verify the log file
my $begin_pos = VerifyLastPosition($INPUT, $lastinfo);
# process the log file
my $f = new IO::File $INPUT, "r" or die "Could not open log file";
if ($begin_pos == -1) {
die "Log file has not been changed since last run";
} else {
seek $f, $begin_pos, 0; # seek to start of next line
}
my $next_pos = $begin_pos;
my $next_line;
while ($next_line = <$f>)
{
$begin_pos = $next_pos;
$next_pos = tell;
# process the log file here
chomp($next_line);
print "$next_line\n";
}
# at here, begin pos is the position of the last line
seek $f, $begin_pos, 0;
chomp($next_line = <$f>);
$lastinfo->{pos} = $begin_pos;
$lastinfo->{text} = $next_line;
print "Last Pos Info:\n", Dumper($lastinfo) if ($FEEDBACK);
# ok, write the last info back to file
WriteLastPosFile($LASTPOS, $lastinfo);
exit(0);
sub ReadLastPosFile
{
# last pos file format - <pos>|<last-line-seen>
my $filename = shift;
my $f = new IO::File $filename, "r"
or die "Could not open lastpos file";
chomp(my $info = <$f>);
my %lastinfo;
($lastinfo{pos}, $lastinfo{text}) = $info =~ /(\d+)\|(.*)/;
return \%lastinfo;
}
sub WriteLastPosFile
{
my ($filename, $lastinfo) = @_;
my $f = new IO::File $filename, "w"
or die "Could not write to lastpos file";
printf $f "%s|%s\n", $lastinfo->{pos}, $lastinfo->{text};
}
sub VerifyLastPosition
{
my ($logfile, $lastinfo) = @_;
my $f = new IO::File $logfile, "r" or die "Could not open log file
+";
seek $f, 0, 2; # seek to the end of the file
my $eof = tell $f;
return 0 if $lastinfo->{pos} >= $eof; # ok, file has been trimmed
seek $f, $lastinfo->{pos}, 0;
chomp(my $line = <$f>); # retrieve what we believe was the last li
+ne
return 0 if $line ne $lastinfo->{text}; # ok, file has been trimme
+d
my $begin_pos = tell $f; # otherwise start from next line
# -1 means the file has not been changed since
# last time it was parsed.
return $eof == $begin_pos ? -1 : $begin_pos;
}
| [reply] [d/l] |
Off topic, but let me get this straight. You have a log file that will grow from now until the end of time? That can make for a pretty large file, and every operating system that I know has a limit on how large a file can get. Do you have any way to rotate the log? We have logs from an app at work that write to a new log once a day, so every day has a new log.
Now, back to the matter at hand. Is there any way that you can uniquely identify your lines in the file? If so, you could (and should) set up a primary key/unique index on the database table that you're inserting in to. This will prevent the duplicate data from ever entering the database, so you'll be guarded on two fronts.
thor
| [reply] |
-r-xr-x--- 1 log 12345 Jan 2003 log.1.gz
-r-xr-x--- 1 log 13451 Jan 2002 log.2.gz
...
some logs can grow real slow. why rotate daily/weekly/... when yearly or by size will do. guess it just depends... | [reply] [d/l] |
Probably the best way is to keep track of the number of bytes you read in, then save that value to a file. Upon the next run, you can read that value back and then use seek to go there.
Alternatively, you could use my $size = -s "filename"; when you're done, though it's a potential race condition (if something adds to the file after you call the above).
Update: Never mind. Forgot about tell.
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
: () { :|:& };:
Note: All code is untested, unless otherwise stated
| [reply] [d/l] [select] |
Maybe I cheat on this answer a little, but I believe it's not the question of where you stopped parsing, but with what. Logfiles have (or should have) the tendency to rollover (to be archived).
Since this question involves a logfile (and most likely all entries start with a date), you might want to consider writing the last date of the line you parsed to a temp. file. When you re-parse the logfile, ignore all lines with dates before that date. This might be a little heavy on system resources though (especially with a lot of entries).
If the logfile (old entries) is not really meaningfull to you after parsing, you even might consider to delete them entires after parsing. That would solve the problem, but you probably wouldn't be asking this question, if this could be the solution ;)
| [reply] |