working with remote log files

widedave has asked for the wisdom of the Perl Monks concerning the following question:

I need to regulary monitor several hundred log files spread over approximately 100 servers (by regulary monitoring the logs I mean roughly every 120 seconds - I have a cron entry that runs my log check script every 2 minutes). Currently, I'm using File::Remote to open the files remotely. I use tell() to report the last line of the current open iteration of the file which I then write to disk (call it $position). I then use seek() to use this $position variable to jump to the last position checked in the file and continue processing (ie. checking for errors). My problem is two-fold: efficiency-wise I think I'm way off as the processing of each iteration of files takes longer than the 120 seconds that my cronjob is set up for. I could up the delay between script runs, but it's pretty important that errors are checked for at least every 2 minutes so I'd like to not go that route. Secondly, when traversing each file opened by File::Remote, the processing of the loop will hang without warning, often times on different files. My tests have all run after production hours so I've done my best to rule out production processes interfering with my logfile reading. Here's my code. Thanks for any help! --------------------------------------------------

#! /usr/bin/perl -w

use File::Remote;

# open error file for writing all of the log file(s)
# subsequent errors to
open(ERROR, '<error.txt') or die $!;

# open config file containing log file names and server
# locations
open(SERVER_LOGS, "<server_logs.txt") or die $!;

while (<SERVER_LOGS>) {
     chomp $_;
     # get server and logfile name for opening from file
     my ($server, $logfile) = split(/|/, $_);
     
     # open file ontaining the last file position read by
     # the script (unique file for each logfile)
     open(POSITION, "<${server}.${logfile}.line_position")
       or die $!; 
 
      my $position;
      while (<POSITION>) {
        $position = $_;
        chomp $position;
      }
      close(POSITION);

     # create new File::Remote object, backup remote 
     # logfile and open a file handle to it
     my $remote_fh = uc "${server}${logfile}";
     my $remote = File::Remote->new();
     $remote->backup($logfile, "sav");
     $remote->open($remote_fh, "<${server}:/log/ 
      ${logfile}") or die $!;

     # seek to last position in file that was read
     seek($remote_fh, $position, 0);

     # traverse particular logfile, write errors to error
     # file
     while (<$remote_fh>) {
        chomp $_;
        ### SCRIPT STALLS UNPREDICTABLY HERE ###
        if ($_ =~ /^ERROR/) {
            print ERROR "$_\n";
        }
     }

     # get current file position and write it back to
     # position file
     my $line_position = tell($remote_fh);
     open(POSITION, ">${server}.${logfile}.line_position")
       or die $!;
     print POSITION "$line_position";
     close(POSITION);
     $remote->close($remote_fh);
}
close ERROR;
[download]

Comment on working with remote log files Download Code

Replies are listed 'Best First'.
Re: working with remote log files by George_Sherston (Vicar) on Oct 13, 2001 at 19:28 UTC
If it's choking on `chomp`, why not do the `chomp` in the `if($_ =~ /^ERROR/)` loop? <carp type="gratuitous">BTW this cd be written `if (/^ERROR/)` </carp> Not sure WHY this wd fix it, as I'm not au fait with the ramifications of line terminators, but as I read it you don't actually need to `chomp` every line, as that won't affect whether it passes or fails your `if` test. § George Sherston	[reply] [d/l] [select]
Re: working with remote log files by cLive ;-) (Prior) on Oct 13, 2001 at 21:07 UTC
Is there any reason why each machine can't run a cron job every two minutes to create an error summary and then report back to you rather than you calling them? (eg) If you install OpenSSL and Crypt::SSLeay on the remote machines, then you can send the callbacks via SSL. That way, you are minimising network transfer and reducing the load on the 'host' machine. .02 cLive ;-)	[reply]
Re: Re: working with remote log files by data64 (Chaplain) on Oct 13, 2001 at 23:13 UTC
Can you imagine deploying and keep this software up to date on 100 servers ? Then you would also need to have something in place that checks whether this script is still executing on each server. Just adding my 2cents.	[reply]
Re: Re: Re: working with remote log files by Fletch (Bishop) on Oct 14, 2001 at 04:48 UTC
That's what `rsync` is for. No more difficult than managing anything else on 100s of machines. As for making sure that its still running, have it submit a null report even if nothing went wrong. Then you just have to look for who didn't submit anything to determine if it's not running.	[reply]
Re: working with remote log files by data64 (Chaplain) on Oct 13, 2001 at 23:24 UTC
I noticed that you are approaching this sequentially. That is process the 100 servers one server at a time. One way of speeding it would be do have a parallel processes. Thus if you have 10 parallel processes, each only has to process 10 servers. Threads are probably more efficient than processes, but I do not know whether you can use threads in perl Of course you have to realize that this adds a whole new level of complexity as you have to deal with multi-process synchronization issues. In other words, when you need to write to your error file it has to be properly locked, etc. Just an architecture suggestion.	[reply]
Re: working with remote log files by blakem (Monsignor) on Oct 14, 2001 at 01:34 UTC
Assuming these are apache logs, a totally different approach can be found at mod_log_spread. To quote from that page: mod_log_spread is a patch to Apache's mod_log_config, which provides an interface for spread to multicast access logs. It utilizes the group communication toolkit Spread, developed at Johns Hopkins University's Center for Networking and Distributed Systems. mod_log_spread was developed to solve the problem of collecting consolidated access logs for large web farms. In particular, the solution needed to be scalable to hundreds of machines, utilize a reliable network transport, allow machines to added or dropped on the fly, and impose minimal performance impact on the webservers. Current version is 1.0.4. -Blake	[reply]