comment on

All:
For those of you who are interested in JUST the problem and not the background - skip to the text identified with PROBLEM.

I just finished a month long project that added a suite of related scripts to our "toolbox" arsenal at work. Of course, the second I was done and put it into production - I received a thousand feature requests from co-workers. Some of these requests were easy to accomodate, while others require re-designing and won't be done for a long time to come. One of them seems deceptively simple I was hoping you all could help me with.

One of the scripts monitors a directory for transient files (race condition) containing "bad" information and moves them to another directory and writes a log of it. Since this needed to be super duper ultra fast, I wrote as little to the log file as possible. For instance:

Instead of giving it a human readable date/time stamp - I used seconds since epoch.

I left out two columns (directory name & direction) so that print didn't have to interpolate two variables.

Now, many instances of the above script are running on multiple directories - so there are multiple logs. I wrote a companion script to parse those logs and put it in human readable format. For instance:

The timestamp is converted back into human readable format

The directory name and direction is inserted into the output as it is extrapolated from the filename/path.

The script, which you can readmore here....

#!/usr/bin/perl -w
use strict;
use Getopt::Std;
use Time::Local;
use POSIX qw(strftime);
$|++;
my %Opt;
my @Conns;

&GetArgs();
&GetConns();
&GetLogs();

sub GetArgs {
  my $Usage = qq{Usage: $0 [options]
        -h : This help message.
        -c : Specific connector - default is to list all connectors.
        -d : Specific direction - default is to list all directions
        -n : Trap name - default is to list all names
        -t : Time in stamp format mm/dd/yy-hh:mm or mm/dd/yy
                +<stamp> - show entries created after specified stamp
                        If time is not given, defaults to 23:59
                -<stamp> - show entries created before specified stamp
                        If time is not given, default to 00:00
                =<stamp> - show entries created on specified stamp
                        If time is not given it is ignored (all day)
                <stamp>+-<stamp> - show entries created between specif
+ied stamps
                        If time is not given on first stamp, 00:00 is 
+used
                        If time is not given on second stamp, 23:59 is
+ used
                        Note:  This includes the day(s) specified
        -s : Size of files caught in bytes
                +<size> - show entries with files larger than specifie
+d size
                -<size> - show entries with files smaller than specifi
+ed size
                =<size> - show entries with files equal to specified s
+ize
                <size>+-<size> - show entries with files between speci
+fied sizes
  } . "\n";

  getopts( 'hc:d:n:t:s:', \%Opt ) or die "$Usage";
  die "$Usage" if $Opt{h};
  if ($Opt{d}) {
    $Opt{d} = lc($Opt{d});
    die "$Usage" if ($Opt{d} ne "in" && $Opt{d} ne "out" && $Opt{d} ne
+ "both");
  }
}


sub GetConns {
  open (CONNECTORS,"/var/wt400/conf/_wtd.cfg") or die "\nUnable to ope
+n connector file!\n";
    while (<CONNECTORS>) {
      next unless ($_ =~ /^unit="(.*)"/);
      my $Conn = lc($1);
      next if ($Conn eq "ins" || $Conn eq "ins2" || $Conn eq "_wtd");
      push @Conns , $Conn;
    }
  close (CONNECTORS);
  if ($Opt{c}) {
    $Opt{c} = lc($Opt{c});
    if (grep /\b$Opt{c}\b/ , @Conns) {
      @Conns = $Opt{c};
    }
    else {
      die "\nInvalid connector - $Opt{c} !\n";
    }
  }
}

sub GetLogs {
  my @Logs;
  foreach my $Conn (@Conns) {
    my @Directions;
    if ($Opt{d}) {
      @Directions = $Opt{d};
    }
    else {
      @Directions = (qw(in out both));
    }
    foreach my $Dir (@Directions) {
      push @Logs , "/var/spool/wt400/log/$Conn/trap_${Dir}.log" if (-r
+ "/var/spool/wt400/log/$Conn/trap_${Dir}.log" && -s _);
    }
  }
  unless (@Logs) {
    die "\nUnable to find any logs!\n";
  }
  else {
    while (my $File = shift @Logs) {
      my($mon, $day, $year, $hour, $min);
      open(LOG,$File);
      LINE:
      while (my $Line = <LOG>) {
        chomp $Line;
        my @Fields = split " " , $Line;
        if ($Opt{n}) {
          next unless (lc($Opt{n}) eq lc($Fields[3]));
        }
        if ($Opt{t}) {
          $Opt{t} =~ s/\s+//;
          my $Stamp1;
          my $Stamp2;
          if ($Opt{t} =~ /^\+(.*)/) {
           ($mon, $day, $year, $hour, $min) = split ?[-/:]? , $1;
           ($hour,$min) = (23,59) unless ($hour && $min);
           $Stamp1 = timelocal(0, $min, $hour, $day, $mon - 1, $year +
+ 100);
           next unless ($Fields[0] > $Stamp1);
          }
          elsif ($Opt{t} =~ /^\-(.*)/) {
           ($mon, $day, $year, $hour, $min) = split ?[-/:]? , $1;
           ($hour,$min) = (00,00) unless ($hour && $min);
           $Stamp1 = timelocal(0, $min, $hour, $day, $mon - 1, $year +
+ 100);
           next unless ($Fields[0] < $Stamp1);
          }
          elsif ($Opt{t} =~ /^\=(.*)/) {
            ($mon, $day, $year, $hour, $min) = split ?[-/:]? , $1;
            ($hour,$min) = (00,00) unless ($hour && $min);
            $Stamp1 = timelocal(0, $min, $hour, $day, $mon - 1, $year 
++ 100);
            ($hour,$min) = (23,59) unless ($hour && $min);
            $Stamp2 = timelocal(0, $min, $hour, $day, $mon - 1, $year 
++ 100);
            next unless ($Fields[0] >= $Stamp1 && $Fields[0] <= $Stamp
+2 );
          }
          elsif ($Opt{t} =~ /^(.*)\+\-(.*)/) {
            ($mon, $day, $year, $hour, $min) = split ?[-/:]? , $1;
            ($hour,$min) = (00,00) unless ($hour && $min);
            $Stamp1 = timelocal(0, $min, $hour, $day, $mon - 1, $year 
++ 100);
            ($mon, $day, $year, $hour, $min) = split ?[-/:]? , $2;
            ($hour,$min) = (23,59) unless ($hour && $min);
            $Stamp2 = timelocal(0, $min, $hour, $day, $mon - 1, $year 
++ 100);
            next unless ($Fields[0] >= $Stamp1 && $Fields[0] <= $Stamp
+2 );
          } 
        }
        if ($Opt{s}) {
          $Opt{s} =~ s/\s+//;
          if ($Opt{s} =~ /^\+(.*)/) {
           next unless ($Fields[2] > $1);
          }
          elsif ($Opt{s} =~ /^\-(.*)/) {
           next unless ($Fields[2] < $1);
          }
          elsif ($Opt{s} =~ /^\=(.*)/) {
            next unless ($Fields[2] == $1);
          }
          elsif ($Opt{s} =~ /^(.*)\+\-(.*)/) {
            next unless ($Fields[2] >= $1 && $Fields[2] <= $2 );
          } 
        }
        if ($File =~ /^.*\/(.*)\/trap_(.*)\.log/) {
          my $Conn = $1;
          my $Dir = $2;
          my $Time = strftime("[%x-%X]",localtime($Fields[0]));
          print "$Time $Conn $Dir $Fields[3] $Fields[1] $Fields[2]\n";
        }
      }
    }
  }
}
[download]

allows the option of just looking at one log or if no option is specified, all the logs at the same time as well as parsing for only specific information.

PROBLEM

My co-workers would like to have the logs interleaved (sorted chronologically) if they are displaying all the logs at once. This seems incredibly easy since the first column is a timestamp and would automatically sort chronologizally. The problem is there are two columns in the output (see below) that are dynamically generated based off of filename/path.

RAW:
1044007259 do15505x 467 PaulRidge
1044022188 do15667s 876 Tom-Snow
1044029052 do15854j 3228 BCorcoran
FORMATED:
[01/31/03-11:41:28] DIR1 out MarkLester doqnh6y5 10300
[01/31/03-16:28:20] DIR1 out BrianSmith doavr564 8353
[01/31/03-16:38:12] DIR1 out MarkLester doavr5g4 9663
[01/30/03-23:02:08] DIR2 out PaulRidge do15347q 2394
[01/30/03-23:02:08] DIR2 out PaulRidge do15347t 492

Note: The raw and formated are samples and do not represent the same d
+ata.
[download]

Since my code currently reads each file in one at a time, it is able to dynamically generate these two columns. To sort all the results chronologically means that I would have to read them all in first (they could get quite large), perform the sort, and print out the output. I thought of the following alternative options:

Change the first script interpolate the variables and include them in the log (NOT desirable).

Read each log in, parse and format, append to a single temporary file, read the file back in using a pipe as a file handle open(LOGS, "sort <file> |");, displaying the results, remove the temporary file.

Telling my co-workers just to | to sort ("what, you mean Perl can't do something?" as they are always quick to say)

Tie the files to a hash, merge, sort, display, untie (seems too busy and inefficient, but I could be wrong)

What I would like to do is open (LOGS,"sort @Logs |");, but then I would lose the filename/path and wouldn't be able to generate those two columns.

Any advice (besides my regular expressions in parsing my data)?

Thanks in advance, L~R

In reply to Log parsing by timestamp dilema by Limbic~Region

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.