perl_devel has asked for the wisdom of the Perl Monks concerning the following question:

I have a requirement like have to loop thru error logs and for each entry find a matching entry( using regular expressions) in access logs.

Since the access log(8 lakh lines) are huge compared to error log( 7 thousand lines) i do a reverse search , Loop thru access log once and for each of the entry find the closest match in the error log based on time and the error .

Initially the script took 12 hours , after several tuning like matching with timestamps skipping several entries in access log to attain the next error log entry etc the execution is boosted to 4 hours .

Is there is still any other way to reduce the time and boost the performance of the code?

Replies are listed 'Best First'.
Re: Working with Access Logs
by superfrink (Curate) on Feb 28, 2006 at 11:44 UTC
    It's really hard to say where to look to speed up code that we haven't seen. Maybe have a look at using Apache::LogRegex.

    Interestingly I recently learned that one lakh is equal to a hundred thousand (ie 10^5). As far as I know "lakh" is not commonly used in North America.
Re: Working with Access Logs
by marto (Cardinal) on Feb 28, 2006 at 11:17 UTC
    perl_devel,

    Before anyone can suggest 'other way to reduce the time and boost the performance of the code' you should post the code in question. That way people can suggest changes which could be made. Take a look at How do I post a question effectively? regards posting pertinent information.

    Martin
      I have listed the code extract can anyone suggest the way to still boost the performance

        perl_devel,

        You have been a user here long enough to know that you should not use PRE tags when posting code. This is also pointed out to you when you 'preview' a post. Please re read the PerlMonks FAQ and Writeup Formatting Tips.

        Update: Thanks for swapping pre for code tags, however if you look at How do I change/delete my post? (linked from the PerlMonks FAQ I mentioned above) you will find the conventions use here for updating nodes you have written.

        Martin
Re: Working with Access Logs
by salva (Canon) on Feb 28, 2006 at 12:02 UTC
    as logs are almost sorted by date, you could probably limit your search on the error log to a small window:
    my $window = 300; # 5 min. my @errors; my @timestamps; while (<ACCESSLOG>) { my $timestamp = get_timestamp($_); while (!@timestamps or $timestamps[-1] < $timestamp + $window) { if (defined(my $err = <ERRORLOG>)) { push @errors, $err; push @timestamps, get_timestamp($err); } else { last; } } while (@timestamps and $timestamps[0] < $timestamp - $window) { shift @errors; shift @timestamps; } for my $error (@errors) { # check if access line $_ matchs # with error line $error here: ... } }
      In the past I have used a C program called mergelog to merge and sort Apache CLF (Commong Log Format) log files. I used this when I had multiple web servers serving a set of websites.