vxp has asked for the wisdom of the Perl Monks concerning the following question:

Here's the thing:

I've a log file. I need to count how many times a particular string appeared in that log file within the past 1 hour.

So I employed File::ReadBackwards for that purpose.

However, I get very weird results!

Judge for yourself, code (and output from it) below

#!/usr/bin/perl # Checks for proactive hits within the past hour use File::ReadBackwards; use POSIX "strftime"; my $success; my $items; my $earliest = POSIX::strftime( "%Y-%m-%d %H:%M:%S", localtime( time() + - 60 * 60 ) ); print "$earliest\n"; $bw = File::ReadBackwards->new( '/software/logs/CCASL/ccasl.log' ) or die "can't read '/software/logs/CCASL/ccasl.lo +g' $!" ; while ( defined( my $log_line = $bw->readline() ) ) { last if $log_line lt $earliest; if ($log_line =~ /ProcessHTTPRequestImpl/) { print $log_line; } }

Here's the output you get when you run it:

[root@lpo-tomcat-09 ~]# prohits.pl 2007-07-10 12:17:22 2007-07-10 13:17:05,879 (rocessor23) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272332 catTag:10006 caller:rtr2_CAP1_1a w +arp:true 2007-07-10 13:17:05,750 (rocessor16) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272334 catTag:10006 caller:rtp2_CAP3_1a w +arp:true 2007-07-10 13:17:05,742 (rocessor24) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272326 catTag:10006 caller:rtr2_CAP1_1a w +arp:true 2007-07-10 13:17:05,627 (rocessor16) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272259 catTag:10006 caller:rtp2_CAP4_1a w +arp:true [root@lpo-tomcat-09 ~]#

As you see, $earliest is set to the right string - one hour ago. But it only prints stuff that happened within the last minute, not last hour!

Any ideas?

Replies are listed 'Best First'.
Re: Problem using File::ReadBackwards
by ysth (Canon) on Jul 10, 2007 at 17:34 UTC
    I'd guess you have a line that doesn't start with a timestamp. How about
    last if $log_line =~ /^\d{4}-\d\d-\d\d \d\d:\d\d:\d\d/ && $log_line lt + $earliest;
    Or make your test code print out the line that made it stop:
    if ($log_line lt $earliest) { use Data::Dumper; $Data::Dumper::Useqq = 1; print Dumper("stopped on $log_line"),"\n"; }
Re: Problem using File::ReadBackwards
by thezip (Vicar) on Jul 10, 2007 at 17:46 UTC

    vxp,

    I'd say you might have a problem with your datafile in that it contains something that you are not expecting/handling.

    Try removing lines or chunks of lines from the data file until your problem goes away. When it begins to work, add the lines back in until it breaks again to detemine the offending line(s).

    Alternatively, construct a bogus and properly formatted file that *must* work, and then begin inserting lines that could cause it to break.


    Where do you want *them* to go today?

      You all were right.

      It turned out the log file had empty lines in it, for whatever reason our developers decided to stick them in there.

      I easily solved it by simply doing:

      next if $log_line =~ /^$/;
      prior to
      last if $log_line lt $earliest;
        ysth is dead right. Why check for the one case which you know doesn't work and exclude it? It would be safer (in case your designers do something else weird) to look for the condition that satisfies your requirements.

        I'll even save you the typing:
        while ( defined( my $log_line = $bw->readline() ) ) { my ($logTime) = ($log_line =~ /^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{ +2}).*ProcessHTTPRequestImpl/); if ($logTime){ last if $logTime lt $earliest; print $log_line; } }
        Please don't use:
        next if $log_line =~ /^$/;
        ... I shudder just thinking about it :-)
Re: Problem using File::ReadBackwards
by ikegami (Patriarch) on Jul 10, 2007 at 17:32 UTC
    It would be nice to see the line that causes the last to be executed.
      Here's the result of a simple grep:
      2007-07-10 13:27:05,597 (rocessor23) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272701 catTag:10007,10006 caller:rtp2_CAP +4_1a warp:true 2007-07-10 13:27:05,928 (rocessor25) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272750 catTag:11543 caller:rtp2_CAP4_1a w +arp:true 2007-07-10 13:29:05,620 (rocessor16) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272772 catTag:10006 caller:rtp1_CAP4_1a w +arp:true 2007-07-10 13:29:05,638 (rocessor25) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272779 catTag:10007,10006 caller:rtp1_CAP +2_1a warp:true 2007-07-10 13:29:05,715 (rocessor24) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272809 catTag:10006 caller:rtp1_CAP1_1a w +arp:true 2007-07-10 13:31:05,559 (rocessor25) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272887 catTag:10007,10006 caller:rtp1_CAP +1_1a warp:true 2007-07-10 13:31:05,608 (rocessor24) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272893 catTag:10007,10006 caller:rtp1_CAP +2_1a warp:true 2007-07-10 13:31:05,747 (rocessor23) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272955 catTag:10007,10006 caller:rtp1_CAP +1_1a warp:true 2007-07-10 13:33:05,342 (rocessor24) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272849 catTag:11543 caller:rtp1_CAP1_1a w +arp:true 2007-07-10 13:33:05,355 (rocessor25) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272841 catTag:10007,10006 caller:rtp1_CAP +2_1a warp:true 2007-07-10 13:33:05,385 (rocessor23) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272987 catTag:10006 caller:rtr2_CAP1_1a w +arp:true 2007-07-10 13:33:05,404 (rocessor16) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272964 catTag:10007,10006 caller:rtp1_CAP +4_1a warp:true
      Here's where the script stops displaying:
      [root@lpo-tomcat-09 ~]# prohits.pl 2007-07-10 12:34:40 2007-07-10 13:33:05,404 (rocessor16) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272964 catTag:10007,10006 caller:rtp1_CAP +4_1a warp:true 2007-07-10 13:33:05,385 (rocessor23) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272987 catTag:10006 caller:rtr2_CAP1_1a w +arp:true 2007-07-10 13:33:05,355 (rocessor25) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272841 catTag:10007,10006 caller:rtp1_CAP +2_1a warp:true 2007-07-10 13:33:05,342 (rocessor24) INFO [ProcessHTTPRequestImpl] Ca +llOut Params: DREREFERENCE:6272849 catTag:11543 caller:rtp1_CAP1_1a w +arp:true [root@lpo-tomcat-09 ~]#

        Here's the result of a simple grep:

        Using grep means that you are removing some lines, while your program does not. You might very well have removed the line that causes the problem. You repeated the assumption I suggested you verify.

        if ($log_line lt $earliest) { print("DEBUG -- last:\n"); print($log_line); last; }

        I bet you won't find a timestamp at the begining of that line.

        I can't think of anything you could be grepping on that wouldn't hide the lines that would be causing you trouble.