Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Heavenly Beloved,
I need devine guidance on this one as I am fairly new to Perl and haven't seen any examples on how to do the following:

1. Open and read from a file.
2. Close the file
3. Open the file again from the offset from which it was closed previously.

Basically I have a log file I need to parse every night and unfortunately the log file continues to grow everyday. So I will need to open it, parse it, put some data in a database every night and I need an efficient means of knowing where the file was closed the previous days so as not to introduce redundant data. All guidance, wisdom or pennance greatly appreciated....
  • Comment on Opening a file at a designated offset based on closing of the file

Replies are listed 'Best First'.
Re: Opening a file at a designated offset based on closing of the file
by meetraz (Hermit) on Nov 10, 2003 at 21:42 UTC
    You can use the tell() function to get the current file offset, and seek() function to get back to that offset, once you've reopened the file.

    use strict; ## obtain $LastPosition open(LOG, '/var/log/messages'); seek(LOG, $LastPosition, 0); while (my $line = <LOG>) { chomp $line; ## Process $line; } $LastPosition = tell(LOG); close(LOG); ## store $LastPosition
Re: Opening a file at a designated offset based on closing of the file
by shenme (Priest) on Nov 10, 2003 at 21:51 UTC
    I believe you are looking for tell() and seek(). You would use tell() to get your position before you close the file, then use seek() to move to that same position before you start reading again.

    Some bits-o-code from some bits-o-dreck-o-mine:

    # Close the log file and save information for next log check. # Remember where (and when) we stopped reading for next time. $FTPTrkLastReadPos = tell(FTPLOG); $FTPTrkLastReadTime = time; close(FTPLOG);
    and
    # Get current characteristics of the log file my ($filesize,$modified,$created) = (stat(_))[7,9,10]; $FTPTrkLastReadPos = 0 if( $filesize < $FTPTrkLastReadPos ); # Process any and all new entries seek FTPLOG, $FTPTrkLastReadPos, 0;
    Why do I check the filesize of the file before reading it?   Because someone may have deleted, reduced or truncated the file since I last read it.   In my case I am able to handle this gracefully because I can look at the timestamps in each log entry and see if I have processed them before.
Re: Opening a file at a designated offset based on closing of the file
by zengargoyle (Deacon) on Nov 10, 2003 at 22:23 UTC

    here's a little script that does just that.

    my $dhcplog = '/var/log/dhcpd'; my $lastfile = '/var/run/statdhcpcron.last'; my $logfile; open $logfile, '<', $dhcplog; sub setlast { my $last = shift; my $f; open $f, '>', $lastfile or return; print $f $last,$/; # last is first line report($f); # report is the rest of f +ile close $f; } sub getlast { local @ARGV = $lastfile; my $l = <>; # last is first line $l ||= 0; # or start from beginning $l = 0 if $l > -s $logfile; # handle rotate/truncate return $l; } my $last = getlast(); seek $logfile, $last, 0; while (<$logfile>) { # process lines } setlast(tell $logfile); # remember where we stopped close $logfile; # mail report if needed exit; sub report { # returns report based on lines processed }

    don't forget to handle cases such as 'first time run' and 'file mysteriously shrunk'. this script keeps the last byte processed (and a copy of the report) in a file in /var/run.

Re: Opening a file at a designated offset based on closing of the file
by Roger (Parson) on Nov 11, 2003 at 00:45 UTC
    I think a simple tell -> seek algorithm and by checking file size == 0 are not enough. Your log file might have been truncated, re-written to, and then the new file size grows bigger than before. If you seek to previous position, then the script will work, but now it has an undetected logic error. You will need to record the position + the last line you have seen, then you can make sure that is the line where you have last visited.

    This will work if the log is not a rotating log. What if it is? Uh... my head hurts.

    The following is my attempt during my lunch break - (Ok, I have deliberately chose not to use Pod::Usage)
    use strict; use Getopt::Long; use IO::File; use Data::Dumper; GetOptions ( 'i|input=s' => \( my $INPUT = "./access.log" ), 'l|lastpos=s' => \( my $LASTPOS = "./lastpos.txt" ), 'f|feedback' => \( my $FEEDBACK = undef ), ); unless ( defined $INPUT && defined $LASTPOS ) { print <<EOF; Logfile Parser - Parse input log efficiently Usage: $0 [option] Options: -i|--input [filename] Specify the input log file name. -l|--lastpos [filename] Specify the name of last pos file. -f|--feedback Let the program print progress prompt. EOF exit(1); } # load the last pos information my $lastinfo; $lastinfo = ReadLastPosFile($LASTPOS) if -f $LASTPOS; print "Last position:\n", Dumper($lastinfo) if ($FEEDBACK); # verify the log file my $begin_pos = VerifyLastPosition($INPUT, $lastinfo); # process the log file my $f = new IO::File $INPUT, "r" or die "Could not open log file"; if ($begin_pos == -1) { die "Log file has not been changed since last run"; } else { seek $f, $begin_pos, 0; # seek to start of next line } my $next_pos = $begin_pos; my $next_line; while ($next_line = <$f>) { $begin_pos = $next_pos; $next_pos = tell; # process the log file here chomp($next_line); print "$next_line\n"; } # at here, begin pos is the position of the last line seek $f, $begin_pos, 0; chomp($next_line = <$f>); $lastinfo->{pos} = $begin_pos; $lastinfo->{text} = $next_line; print "Last Pos Info:\n", Dumper($lastinfo) if ($FEEDBACK); # ok, write the last info back to file WriteLastPosFile($LASTPOS, $lastinfo); exit(0); sub ReadLastPosFile { # last pos file format - <pos>|<last-line-seen> my $filename = shift; my $f = new IO::File $filename, "r" or die "Could not open lastpos file"; chomp(my $info = <$f>); my %lastinfo; ($lastinfo{pos}, $lastinfo{text}) = $info =~ /(\d+)\|(.*)/; return \%lastinfo; } sub WriteLastPosFile { my ($filename, $lastinfo) = @_; my $f = new IO::File $filename, "w" or die "Could not write to lastpos file"; printf $f "%s|%s\n", $lastinfo->{pos}, $lastinfo->{text}; } sub VerifyLastPosition { my ($logfile, $lastinfo) = @_; my $f = new IO::File $logfile, "r" or die "Could not open log file +"; seek $f, 0, 2; # seek to the end of the file my $eof = tell $f; return 0 if $lastinfo->{pos} >= $eof; # ok, file has been trimmed seek $f, $lastinfo->{pos}, 0; chomp(my $line = <$f>); # retrieve what we believe was the last li +ne return 0 if $line ne $lastinfo->{text}; # ok, file has been trimme +d my $begin_pos = tell $f; # otherwise start from next line # -1 means the file has not been changed since # last time it was parsed. return $eof == $begin_pos ? -1 : $begin_pos; }
Re: Opening a file at a designated offset based on closing of the file
by thor (Priest) on Nov 11, 2003 at 00:26 UTC
    Off topic, but let me get this straight. You have a log file that will grow from now until the end of time? That can make for a pretty large file, and every operating system that I know has a limit on how large a file can get. Do you have any way to rotate the log? We have logs from an app at work that write to a new log once a day, so every day has a new log.

    Now, back to the matter at hand. Is there any way that you can uniquely identify your lines in the file? If so, you could (and should) set up a primary key/unique index on the database table that you're inserting in to. This will prevent the duplicate data from ever entering the database, so you'll be guarded on two fronts.

    thor

      i would agree, except i've found logs that look like:

      -r-xr-x--- 1 log 12345 Jan 2003 log.1.gz -r-xr-x--- 1 log 13451 Jan 2002 log.2.gz ...

      some logs can grow real slow. why rotate daily/weekly/... when yearly or by size will do. guess it just depends...

Re: Opening a file at a designated offset based on closing of the file
by hardburn (Abbot) on Nov 10, 2003 at 21:50 UTC

    Probably the best way is to keep track of the number of bytes you read in, then save that value to a file. Upon the next run, you can read that value back and then use seek to go there.

    Alternatively, you could use my $size = -s "filename"; when you're done, though it's a potential race condition (if something adds to the file after you call the above).

    Update: Never mind. Forgot about tell.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: Opening a file at a designated offset based on closing of the file
by b10m (Vicar) on Nov 11, 2003 at 10:13 UTC

    Maybe I cheat on this answer a little, but I believe it's not the question of where you stopped parsing, but with what. Logfiles have (or should have) the tendency to rollover (to be archived).

    Since this question involves a logfile (and most likely all entries start with a date), you might want to consider writing the last date of the line you parsed to a temp. file. When you re-parse the logfile, ignore all lines with dates before that date. This might be a little heavy on system resources though (especially with a lot of entries).

    If the logfile (old entries) is not really meaningfull to you after parsing, you even might consider to delete them entires after parsing. That would solve the problem, but you probably wouldn't be asking this question, if this could be the solution ;)

    --
    B10m