quasimojo321 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Bretheren,

I am a creating a deamonized perl executable to constantly scrape the contents of a databse logfile. When all is said and done I will actually have to scrape about 124 of these logfiles. My initial thoughts were to use File::ReadBackwards to scrape the last line of every logfile. I would then parse that single line file for known error codes. If it finds something interesting, it'll mail a warning and the line to a recipient. My question is this... I would like to start the perl deamon (let's refer too it as logdbd.pl hereafter) once and then have it spawn a sub-proccess for each database log that I want to scrape.

I know that I must use fork and exec to do this but I've gotta say that I don't know exactly what I am doing. Should I create a main loop that does the last line fetch and scrape and then create a sub that iterates through my logfile array and re-executes the main loop using the next logfile name as the new argument? Remember that the the execution of the main program should be endless. That being the case would I ever be able to spark up the next child proccess seeing as the first one to be started never exited?


I am only a fair to middling perl coder, and this is my first foreray into a complex program in any language. Any help would be greatly appreciated. My Sample code is as follows:

#!/usr/bin/perl -w # use strict; use File::ReadBackwards; my (@DBLOG,@SKIPCODES,$line,$logline,$list); @DBLOG=qw( /bd01/systems/system.lg /ad04/orders/orders.lg /bd02/master +/master.lg /cd01/billing/billing.lg /cd02/audit/audit.lg ); @SKIPCODES=qw( (43) (334) (354) (8826) (3803) (50) (5140) ); $list = join ("|", map {qoutemeta} @SKIPCODES); $node=`uname -n`; foreach $i (@DBLOG); { unless (defined ($pid = fork})) { die "Can't Fork Proccess: $!"; } unless (defined ($pid)) { LOGSCRAPE ($i); } ################################ sub LOGSCRAPE { $line = File::ReadBackwards->new( '$_' ) || die "Can't read from $_: + $!"; $logline - $line->readline; while ($logline) { next if $logline =~ /\W$list\W/; next if $logline =~ /^$/; system (echo \"There is a problem with $_ on $node. Here is the +error code:\n\n $logline\" | mailx -s \"PROGRESS DATABASE PROBLEM!!!\ +" psmith\@xxxx.com"); system ("sleep 60"); } }


I would like to eventually substitue snmp traps to a central console in lieu of emails, but one hurdle at a time. Again any thoughts, opinions or advice would be very much appreciated.

Thanks,
Pat

Replies are listed 'Best First'.
Re: Creating a Deamonized Log Scraper
by ehdonhon (Curate) on Feb 07, 2002 at 23:17 UTC

    The concept seems sound. You don't need an exec() because you aren't passing control to any other programs. You might want to use qr to compile your regular expression once instead of every single time.

    I think you need to re-evaluate what you are doing in your while loop. If either of those regular expressions match, you'll have an infinite loop. It doesn't look like you ever are updating the value of $logline unless there is some magic in File::ReadBackwards that I'm not understanding.

    Although it might not be a good fit for you in this case, I recommend reading about Parallel::ForkManager. It is so very simple to use and is often a great choice whenever you are trying to speed up a job by running it in parallel.

Re: Creating a Deamonized Log Scraper
by rjray (Chaplain) on Feb 08, 2002 at 00:12 UTC

    I think there is a larger issue here, and that is the current approach you are taking could seriously impact the resources on the system you run it on. If you fork once for each file you are trying to monitor, and you truly end up monitoring in excess of 120 such files, you will run into overhead problems.

    As a first step, I would recommend that you structure the program around the regular polling of the files for actual changes in the modification time. If all you do is continually look at the last line every n minutes, do you sent multiple mail alerts for the same line merely because there hasn't been a new line in the last n + 1 minutes?

    Secondly, the only real need for forking is to prevent the sending of mail from stopping the main program from continuing. To this end, once you have a truly new line of output from one of the logs, check it against the pattern within the main program. If it warrants an e-mail alert, have the subroutine that sends the alert do the forking. Plus, there is no need for sleeping after the mail (unless this was part of the original design to wait a certain period before polling the logfile again). If the forked child process is only responsible for sending the e-mail, it can use exec() rather than system(). You will also want to do something with the CHLD signal in the parent process.

    Getting back to the main point, I don't think you really want a one-to-one mapping of process-to-logfile, unless the machine running these processes is dedicated to just those tasks. This isn't a trivial task, especially not with the number of files you expect to be monitoring simultaneously. It is well worth the effort of taking some time to plan it through, and carefully.

    --rjray

Re: Creating a Deamonized Log Scraper
by Anonymous Monk on Feb 08, 2002 at 03:14 UTC
    Uh-oh, my chastened alarm is starting to buzz loudly!!! LOL. I think the both of you have some very good points and maybe that I haven't thought it through all the way. I did try to make this program so it would be non-system intensive...

    My original instructions were to create a deamon that would continually scan the most current last line of seven database log files. Said databases are accessed continously and rather heavily. The entries to the logs are made as they occur, not really on a timed basis. I basicly want to proccess each new last line as it is generated and scrape only that new line for possible error codes. One of the requirements is that logdbd not miss any new last lines. By putting the scrape into an endless loop like that I thought I would be getting as close to real time error reporting as the cpu scheduler would allow, without having to scrape through the entire file to find the error on the last line. The sleep at the end is to give the operator some time to resolve the problem without flooding the system with emails, otherwise this daemon would keep sending messages every 100 microseconds until the message changed ( talk about resources!!! ). The regular expressions are there because the bulk of logfile entries are uninteresting and it's easier (and faster) to eliminate what I don't want to see then to look for the many errors I do find "interesting".

    My manager and I had discussed the two ways to do this, my way and his, which was to create 1 single threaded proccess that sequentually checked last line on every database. I felt that I would not be meeting the instant reporting always "on" requirement that was being asked of us to deliver (what mesages might I be missing in log-1 while I cycled through log-50 thru log-100?. I really just want to get this thing working on ten logfiles so I can benchmark it and then write it his way and benchmark that.

    I thank you for your input and am eager for any suggestions you might make in regards to this thought proccess. I am still new to this language and may be trying to misapply it in some manner here. Please let me know.
    Thanks,
    Pat

      But your method doesn't eliminate the possibility of missing lines. You still have a race condition. What if your log got 20 lines apppended to it during that 60 second sleep? You'd only see the last line.

      It sounds to me like you want to try to emulate 'tail -f' on your log files and then continuously cycle through them (perhaps in a multi-threaded fashion) looking for updates. I agree with the previous poster, though that a one to one relationship between processes and log files isn't such a good idea. I would suggest building a system where the number of children is configurable, and they all share the job of scanning your logs. Each time a child comes to a log it would need to pick up on the filehandle where it left off and scan till it gets to the end, then move on to whatever log it should do next.

Re: Creating a Deamonized Log Scraper
by quasimojo321 (Initiate) on Feb 08, 2002 at 21:27 UTC


    Just to wrap this one up for posterity (for now at least) here is the final beta code for the little project. MAJOR concerns not withstanding, the code seems to function the way it's supposed to. It starts up 7 logfile scrapers that run continously in the background looking at the last line of each logfile and signaling when it sees something it doesn't like. I will be adding code to track and kill the children should the need to arise (probably use ctrl C to send a sigkill as I have seen previously demonstrated on this site.) As well I will benchmark the activity of this deamon and let you know how it afffects a copy of our live environment. I will also try to address concerns fronted by members regarding race conditions and resource utilization problems. As you can see I have implemented some of your suggestions already...

    Without further ado, here is the code:
    #!/usr/bin/perl -w # use strict; use Parallel::ForkManager; use File::ReadBackwards; # my (@DBLOG,@SKIPCODES,$logline,$line,$list,$node,$pm,$i); # @DBLOG=qw( /bd01/systems/systems.lg /ad04/orders/orders.lg /bd02/maste +r/master.lg /cd01/billing/billing.lg /cd02/audit/audit.lg /bd01/custo +m/custom.lg /bd01/apprules/apprules.lg ); # @SKIPCODES=qw( lots of inconsequential stuff that I want to ignore ); $list = join ("|", map {quotemeta} @SKIPCODES); $node=`uname -n`; $pm=new Parallel::ForkManager(7); foreach ($i) { my $pid=$pm->start and next; while (1==1) { $line = File::ReadBackwards->new( $i ) || die "Can't read from + $i:$!"; $logline = $line->readline; next if $logline =~ /\W$list\W/; next if $logline =~ /^$/; system ("echo \"There is a problem with database logfile: \"$i +\" on $node\. Here is the error\:\n\n $logline\" | mailx -s \"PROGRE +SS DATABASE PROBLEM!!!\" psmith\@xxxxx\.com"); system ("sleep 120"); } }


    Hope this is of some use to someone at some point. Code On!!!

    Pat