http://qs1969.pair.com?node_id=204973

submersible_toaster has asked for the wisdom of the Perl Monks concerning the following question:

Hello PerlMonks,

Having an idea in mind, but not sure how to go about it, I seek your collective wisdom.

Presently I have a script watching/analysing the access.log from squid. Being a little dubious about opening this file whilst squid is busy with it, my script gives squid a squid -k rotate , then reads the rotated log access.log.0

On reflection, this is pretty bad magic - particularly when squid is being flogged by our users, it can take some time before it gets around to rotating the log.

I'd like to find a better way to read this log periodically , preferably as short or long a period as desired. My first thought was to make the script aware of what position/byte/line was last read, when it was last run, hence begin parsing the log from that point. This would require some form of persistence, and some extra filehandle voodoo suchas provided by FileHandle->getpos.

Second idea was to use some kind of pipe arrangement , where access.log is a special file with a script connected to the other end.

Where does this leave me? - Right now I have One Idea, Half a Script, >=2 Directions to go, and 0 Clue what to do next Please Help.

Replies are listed 'Best First'.
Re: Picking up where you left off..
by IndyZ (Friar) on Oct 14, 2002 at 05:52 UTC
    Second idea was to use some kind of pipe arrangement , where access.log is a special file with a script connected to the other end.

    This is a named pipe (FIFO), and might work. It's just a special file that one or more processes can write to, and another process can read from. If you are on a Unix system, check out the manpage for mkfifo for more info, or ask Google.

    I would also check out the File::Tail module. It's documentation does a better job of explaining how it works than I can.

    --
    IndyZ

      :( Sadly I don't think I can wrangle FIFO's to my needs. I read the File::Tail docs (why does it need Time::HiRes?) , which would be cool for a daemon type execution, but hoping I was to only run this script at intervals.

      Closer inspection of Perl InANutShell led me to this...

      use strict; open ( COUNT , '<counter'); my $whence = scalar <COUNT>; chomp $whence; close COUNT; open ( FH , "<logfile" ); # scram to position we wrote last to 'counter' seek FH, $whence , 0; my $line = scalar <FH>; print $line; my $count = tell FH; open ( COUNT , '>counter' ); print COUNT $count; close COUNT;
      Which exhibits the behaviour(s) that I believe are needed, in this test case, the script prints the next single line from 'logfile' and exits - saving it's position in logfile to 'counter'. OK so I had to seed counter with '0' first!!

      This is destined to run on rh7.3, perl5.6.1 - what scares me most is how squid will behave with another process reading from the logfile it's writing too.

        Why are you so dubious about opening the squid log for read access while squid writes to them? People do this sort of thing all the time. For example may people sit with a xterm open doing nothing but tail -f logfile. Just imagine if it was harmful: "Just a sec I will check the log file for diagnostic messages.Oh oops! I have to rotate the log files/stop then restart the daemon!". Yuk.

        If you do start scanning the current log at random intervals starting from the file position that you got up to in the previous scan you will need to take into account the default cron jobs that rotate the logs daily (I think). Have a look at the cron jobs on the machine and (at least on some linux machines) /etc/logrotate.conf.

        --blm--
Re: Picking up where you left off..
by perrin (Chancellor) on Oct 14, 2002 at 15:34 UTC
    Your first idea is better. It's simple and easy to implement. You can store the current position in the file with Storable if you like, and you should be able to move around in the file with a basic "seek" command.
Re: Picking up where you left off..
by submersible_toaster (Chaplain) on Oct 15, 2002 at 08:34 UTC

    Thankyou everyone for their ideas and comments, I how have a prototype that works for me(pending more tests). The program 'remembers' and can 'recollect' two particular values( tell FH , $st_ino ). This behaviour is coded by hand for now, but if it gets too cumbersome then I think Data::Dumper (which is now my latest fav module) will help me out.

    blm - you were right to warn me about those rotating logs, and I am experimenting with some 'awareness' in the program for coping with this. Though I'll be reading some more before I settle on this, it appears that keeping track of the inode number as given by file stat, gives an indication as to the 'sameness' of a given file. That is, after a logrotate the file /usr/local/squid/logs/access.log possessed a different inode number. I am using this approach only because I can determine no other way (TMTOWTDI) to glean this info.

    #!/usr/bin/perl -w package HotSaNIC::squid; use strict; use File::stat; use Data::Dumper; use Carp; sub new { my $self = shift; my $settings = shift; # Open the settings, slurp into hash. my %settings; open ( FILE , $settings ) || croak "Cant open settings, $settings $! +"; while (<FILE>) { # throwout comments/blanks next if ($_ =~ /^#/); next if ($_ =~/^\s+$/); chomp; my ($var, $val); ($var , $val ) = split '=' , $_; $settings{$var}=$val; } $settings{TIME}=time; return bless {%settings}, $self; } # Open a file to read persistant data # recollect returns the offset from BOF for seek (where) # and the inode number of the log when we saw it las +t. sub recollect { my $self = shift; my ($where , $node, $data); open ( F , '<.p' ) || return (0,0); $data = scalar <F>; close F; $data =~ /(\d+):(\d+)/; $where = $1;$node = $2; print "Recollected $where , $node\n"; return ($where, $node); } # Write info back to persistant file. sub remember { my $self = shift; my ($where , $node) = @_; open ( F , '>.p' ) || croak "Cant open persisting file"; print F "$where:$node"; close F; } sub go { my $self = shift; my ( $where , $node , $logstat, $pos); my ( $time , $duration , $result , $bytes); ($where , $node ) = $self->recollect; $logstat = stat ($self->{LOG}); # Compare atime of log agains the persistant info. if ( $node != $logstat->ino ) { # Oops, the log has been rotated since we last ran print "Oh crap , someone spun the log beneath us!\n"; $self->remember(0, $logstat->ino ); $where = 0; } # Fiddle through the log open ( LOG , $self->{LOG} ) || croak "Cannot open log ,$!"; seek LOG , $where ,0; $self->{POS} = tell LOG; $self->{DURATIONMIN}=999999; # A very big bogus number $self->{DURATIONMAX}=0; $self->{BYTESHIT}=0; $self->{BYTESMISS}=0; $self->{GROSSREQUESTS}=0; while ( <LOG> ) { if ($self->inInterval($_)) { # DO Something with this VALUABLE information # Moreover , set $self{POS} to mark position in file. #print "$_"; $self->{POS}= tell LOG; ($time , $duration , $result , $bytes , undef) = $self->breakLin +e($_); # Tally up some figures if ($result =~ /HIT/) { $self->{BYTESHIT} += $bytes }; if ($result =~ /MISS/) { $self->{BYTESMISS} += $bytes }; if ($duration > $self->{DURATIONMAX}) { $self->{DURATIONMAX} = $ +duration }; $self->{GROSSREQUESTS}++; if ($duration < $self->{DURATIONMIN}) { $self->{DURATIONMIN} = $ +duration }; } } if ($self->{DURATIONMIN} == 999999) { $self->{DURATIONMIN}='U'}; if ($self->{DURATIONMAX} == 0) { $self->{DURATIONMAX}='U' }; $self->remember( $self->{POS} , $logstat->ino ); print Dumper $self; } sub breakLine { # Splitter for squid access logs my $self = shift; my $line = shift; my @data = split " ", $line; splice @data , 2 ,1; return @data; } sub inInterval { # determine if the passed line is in our interval range. my $self = shift; my $line = shift; $line =~ /^(\d+\.\d+)\s+/; my $time = $1; ($time < $self->{TIME} ) and ($time > ($self->{TIME}-60) ) and return $line; return undef } package main; use File::stat; my $zz= HotSaNIC::squid->new('settings'); $zz->go;

    Ok Monks, go ballistic - I'm please I got this far but it's still pretty messy. Thankyou all again for your help.