Parsing Pattern Question

dlcasey has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Parsing Pattern Question
by ikegami (Patriarch) on Sep 03, 2009 at 14:34 UTC

Parent post as it stood when I replied

H, Perl newb here.... I have an application log I'm trying to retrieve a specific sequence of 4 lines
from(event1, event2,event3, event4 - discarding any duplicates in between).
I've already filtered the log to get all occurances the four lines I want
but I don't know how to pull out the four lines when they happen sequentially.
EX:

09:12:50:861 EVENT1 #Don't want this line
09:13:09:467 EVENT1 #Don't want this line
09:13:09:837 EVENT1
09:13:38:059 EVENT2
09:14:03:115 EVENT3
09:14:04:076 EVENT4
09:14:11:376 EVENT1
09:14:34:049 EVENT2
09:14:34:990 EVENT3
09:14:34:990 EVENT3 #Don't want this line
09:14:34:990 EVENT4

I then need to do a time calculation between each of the four events (I can do this part....I just can't
figure out how to pull out the four lines I want everytime they appear in the sequential order I am looking for)
Can anyone help?
Thanks!

my $last;
my @history;
my $expect = 1;
while (<>) {
    chomp;

    my ($num) = /EVENT(\d+)/
        or next;  # Ignore bad input.

    next if defined($last) && $last eq $_;
    $last = $_;

    if ($num == 1 || $num != $expect) {
        @history = ();
        $expect = 1;
    }

    if ($num == $expect) {
        push @history, "$_\n";
        if ($expect++ == 4) {
            print(@history);
            @history = ();
            $expect = 1;
        }
    }
}
[download]

Update: Handle duplicates as requested.

[reply]
[d/l]

Re: Parsing Pattern Question
by BrowserUk (Patriarch) on Sep 03, 2009 at 14:02 UTC

Your spec is inconsistant.

In the first instance you discard 'EVENT1's (bar the last), until you see an 'EVENT2'.
In the second, you keep the first instance of 'EVENT3', and discard dups until you get an 'EVENT4'.

I can see one rule that might explain that process, but better you explain when to discard or retain an earlier (near)duplicate than have me (us) guess.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP PCW It is as I've been saying!(Audio until 20090817)

[reply]

Re: Parsing Pattern Question
by arun_kom (Monk) on Sep 03, 2009 at 15:54 UTC

Assuming that the entries in the log file are ordered by time which is the case in your test data and that you only want to keep the last entry of a particular event, the following would work.

#!/usr/bin/perl -w

use strict;

my %log;

foreach(<DATA>) {
  chomp;
  if(!/^$/){
    /.+(EVENT\d)/;
    $log{$1} = $_;
  }
}

print map { "$log{$_}\n" } sort keys %log; 

__DATA__
09:12:50:861 EVENT1 #Don't want this line
09:13:09:467 EVENT1 #Don't want this line
09:13:09:837 EVENT1
09:13:38:059 EVENT2
09:14:03:115 EVENT3
09:14:04:076 EVENT4
09:14:11:376 EVENT1
09:14:34:049 EVENT2
09:14:34:990 EVENT3 #Don't want this line
09:14:34:990 EVENT3
09:14:34:990 EVENT4
[download]

[reply]
[d/l]

Re: Parsing Pattern Question
by ig (Vicar) on Sep 03, 2009 at 19:22 UTC

Here's another way.

use strict;
use warnings;

my ($lastline, $lastevent);
while (<DATA>) {
    chomp;
    next unless(/EVENT(\d)/);
    print "$lastline\n" if(defined($lastevent) and $1 ne $lastevent);
    $lastevent = $1;
    $lastline = $_;
}
print "$lastline\n";

__DATA__
09:12:50:861 EVENT1 #Don't want this line
09:13:09:467 EVENT1 #Don't want this line
09:13:09:837 EVENT1
09:13:38:059 EVENT2
09:14:03:115 EVENT3
09:14:04:076 EVENT4
09:14:11:376 EVENT1
09:14:34:049 EVENT2
09:14:34:990 EVENT3 #Don't want this line
09:14:34:990 EVENT3
09:14:34:990 EVENT4
[download]

[reply]
[d/l]

Re: Parsing Pattern Question
by bichonfrise74 (Vicar) on Sep 03, 2009 at 20:15 UTC

#!/usr/bin/perl

use strict;

my %record;
my $previous_event;
my $max_event = 4;
my $key = 1;

while (my $line = <DATA>) {
    chomp( $line );
    my ($current_event) = $line =~ /EVENT(\d)/;
    $record{$key}->{$current_event} = $line;

    $key++ if ( $current_event == 4 && 
      $previous_event != $current_event );
    $previous_event = $current_event;
}

for my $i (sort keys %record) {
    print map { $record{$i}->{$_} . "\n" } 
      sort keys %{ $record{$i} };
}

__DATA__
09:12:50:861 EVENT1
09:13:09:467 EVENT1
09:13:09:837 EVENT1
09:13:38:059 EVENT2
09:14:03:115 EVENT3
09:14:04:076 EVENT4
09:14:11:376 EVENT1
09:14:34:049 EVENT2
09:14:34:990 EVENT3
09:14:34:990 EVENT3
09:14:34:990 EVENT4
[download]

[reply]
[d/l]

Re: Parsing Pattern Question
by ambrus (Abbot) on Sep 04, 2009 at 10:17 UTC

uniq -f1 -w7 will discard duplicate events but keep the first one of each chunk.

See Re^2: Joining two files on common field for a list of other nodes where unix textutils is suggested to merge files.

[reply]
[d/l]