ImJustAFriend has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks. I am having a very annoying code issue right now concerning capturing the previous line of a file and matching it, printing an alert, then moving on. First, my input file:

2014-05-20 18:47:08.805161 00:00:00:00:00:02 -> ff:ff:ff:ff:ff:ff ARP +Who has 4.3.2.1? Tell 4.3.2.16 2014-05-20 18:47:08.805691 00:00:00:00:00:01 -> 00:00:00:00:00:02 ARP +4.3.2.1 is at 00:00:00:00:00:01 2014-05-20 18:47:21.335941 00:00:00:00:00:02 -> ff:ff:ff:ff:ff:ff ARP +Who has 4.3.2.1? Tell 4.3.2.16 2014-05-20 18:47:39.005146 00:00:00:00:00:02 -> ff:ff:ff:ff:ff:ff ARP +Who has 4.3.2.1? Tell 4.3.2.16 2014-05-20 18:47:39.005647 00:00:00:00:00:01 -> 00:00:00:00:00:02 ARP +4.3.2.1 is at 00:00:00:00:00:01 2014-05-20 18:48:09.205362 00:00:00:00:00:02 -> ff:ff:ff:ff:ff:ff ARP +Who has 4.3.2.1? Tell 4.3.2.16 2014-05-20 18:48:09.206089 00:00:00:00:00:01 -> 00:00:00:00:00:02 ARP +4.3.2.1 is at 00:00:00:00:00:01 2014-05-20 18:48:39.405393 00:00:00:00:00:02 -> ff:ff:ff:ff:ff:ff ARP +Who has 4.3.2.1? Tell 4.3.2.16 2014-05-20 18:48:39.405857 00:00:00:00:00:01 -> 00:00:00:00:00:02 ARP +4.3.2.1 is at 00:00:00:00:00:01

Basically what I want to do is to identify consecutive requests (like lines 3 & 4 above). If I see one, I want to print an error line and move on. Always the top duplicate (in this case, line 3) will be the one to be alerted on then thrown away. I have been banging on this all day... but I'm not getting anywhere. Here's my code thus far:

#!/usr/bin/perl use strict; use warnings; my $hn = `/bin/hostname`; chomp($hn); my $in = "/root/$hn.pcap"; my $out = "/root/$hn.times"; my $cl; my $ts1; my $ts2; my $ts1ms; my $ts2ms; my $req; my $res; my $td; my $ms; open IN, "<", "$in" or die "IN: $!\n"; open OUT, ">", "$out" or die "OUT: $!\n"; my $pl = ""; while ( $cl = <IN>) { next if ( $cl =~ m/^Running as user.*$/ ); next if ( $cl =~ m/^Capturing on.*$/ ); if ( $cl =~ m/^.*Who has.*$/ ) { ($ts1) = $cl =~ m/^\d+-\d+-\d+\s(.*?)\s\d+:\d+:.*$/; next; } elsif ( $cl =~ m/^.*is at.*$/ ) { ($ts2) = $cl =~ m/^\d+-\d+-\d+\s(.*?)\s\d+:\d+:.*$/; } ($ts1ms) = $ts1 =~ m/^.*?\.(.*)/; ($ts2ms) = $ts2 =~ m/^.*?\.(.*)/; $req = `/root/Time $ts1`; $res = `/root/Time $ts2`; $td = $res-$req; $ms = $ts2ms-$ts1ms; #print "ARP Req: $ts1; ARP Res: $ts2; ARP Time: $ms millisecon +ds\n"; #print OUT "ARP Req: $ts1; ARP Res: $ts2; ARP Time: $ms millis +econds\n"; } close IN; close OUT;

At this point, I'm not even sure what to try or where to put it. Do I look at the next line for a dup and act on the current line if I find one? Do I use the previous line? I'm so lost. Please help me get back on track here, fellow monks... I would appreciate it!!

Thanks!!

Replies are listed 'Best First'.
Re: Previous Line Matching Issues
by davido (Cardinal) on May 20, 2014 at 21:56 UTC

    On line 23 you have "my $pl = ''". You probably intend to use that variable to hold the "previous line." But it's never being set to anything within the while loop.

    The standard algorithm is something like this:

    my $prev; while( my $curr = <$infile> ) { chomp $curr; if( defined $prev && $curr eq $prev ) { # Take some action. } $prev = $curr; }

    If you need to report the line number within the file of the contents of $prev, it's always going to be $.-1.


    Dave

Re: Previous Line Matching Issues
by LanX (Saint) on May 20, 2014 at 22:28 UTC
Re: Previous Line Matching Issues
by InfiniteSilence (Curate) on May 21, 2014 at 00:13 UTC

    My approach involves using a simple data structure to tell you about the past -- a stack:

    perl -e '@stack = (); sub notify { return qq~!!!!@_[0]!!!~}; for(qw~+ +- + - + - + + - +~) {push @stack, $_; (@stack[-2] ne $_ )? print 0 : + print notify(1) };'

    What's happening here:

    • You start with a list of intermittent signals; here just + or - that arrive sequentially
    • We use a data structure that will tell us what the last n elements are in order
    • If some criteria is met we can run some kind of notification. In our case what we want to know is if the current element matches the previous one.

    One additional note: the first example essentially caches everything in the @stack. You can keep only what you need in there by discarding things at the head like this:

    perl -e '@stack = (); sub notify { return qq~!!!!@_[0]!!!~}; for(qw~+ + - + - + - + + - +~) {push @stack, $_; (@stack[-2] ne $_ )? print 0 +: print notify(1); if (@stack >= 3){shift @stack} }; use Data::Dum +per; print Dumper \@stack;'

    gives,

    0000000!!!!1!!!00$VAR1 = [ '-', '+' ];

    Celebrate Intellectual Diversity

Re: Previous Line Matching Issues
by ImJustAFriend (Scribe) on May 21, 2014 at 06:11 UTC

    Many thanks to everyone who posted help! After leaving it for a while, I realized I needed to come at it from a "next line" perspective. After much reading and experimenting (and debugging) it now works how I wanted it to work. Here it is, in case it can help someone else:

    #!/usr/bin/perl use strict; use warnings; my $hn = `/bin/hostname`; chomp($hn); my $in = "/MDS/$hn.pcap"; my $out = "/root/$hn.times"; my $cl; my $nl; my $pos; my $ts1; my $ts2; my $ts1ms; my $ts2ms; my $req; my $res; my $td; my $ms; open IN, "<", "$in" or die "IN: $!\n"; open OUT, ">", "$out" or die "OUT: $!\n"; while ( $cl = <IN>) { next if ( $cl =~ m/^Running as user.*$/ ); next if ( $cl =~ m/^Capturing on.*$/ ); $pos = tell(IN); if ( $cl =~ m/^.*Who has.*$/ ) { $nl = <IN>; if ( $nl =~ m/^.*Who has.*$/ ) { print "NO ARP RESPONSE FOR: $cl"; undef $nl; seek(IN, $pos, 0); } else { ($ts1) = $cl =~ m/^\d+-\d+-\d+\s(.*?)\s\d+:\d+ +:.*$/; undef $nl; seek(IN, $pos, 0); } next; } elsif ( $cl =~ m/^.*is at.*$/ ) { ($ts2) = $cl =~ m/^\d+-\d+-\d+\s(.*?)\s\d+:\d+:.*$/; } ($ts1ms) = $ts1 =~ m/^.*?\.(.*)/; ($ts2ms) = $ts2 =~ m/^.*?\.(.*)/; $req = `/root/Time $ts1`; $res = `/root/Time $ts2`; $td = $res-$req; $ms = ($ts2ms-$ts1ms)/1000; print "ARP Req: $ts1; ARP Res: $ts2; ARP Time: $ms millisecond +s\n"; print OUT "ARP Req: $ts1; ARP Res: $ts2; ARP Time: $ms millise +conds\n"; } close IN; close OUT;
Re: Previous Line Matching Issues
by locked_user sundialsvc4 (Abbot) on May 21, 2014 at 15:22 UTC

    Well, if it works, then so be it, but it would certainly flunk a code-review from me, and if you were on my team I would instruct you to rewrite it.   Let me explain why.

    I have certain pre-conceived notions about how such an algorithm “should” be implemented, such that I am unnecessarily thrown-off by strategies which depart from these norms.   Specifically:

    1. I expect there to be only one place where the next-line of the file is read.   Your version contains at least two.   I am therefore immediately suspicious that this logic will skip lines.
    2. Cryptic variable names such as $cl and $nl, in a substantial piece of code like this, are not acceptable.
    3. This logic looks ahead, where I expect to see state-machine like logic which remembers its past.
    4. I can anticipate that this business requirement might soon call for looking-back some arbitrary-n lines, and I do not readily see how this logic would be readily adapted to do this.
    5. Overall complexity.   This approach is harder than it needs to be, and I doubt that it is accompanied by a rigorous test-suite sufficient to prove that it operates correctly in each and in every edge-case that it might encounter in production.

    And please, don’t take this assessment personally.   It’s not about you.   This is engineering.

    In any backward-looking loop like this one, the issue at hand is always how to correctly maintain the past-history, no matter what pathway is taken through the code.   The next if idioms at the top of the loop are red-flags to me:   you are not updating the history before you do next.   There are other red-flags that jump out at me ... and basically, I don’t want code in the source-base that has any red-flags that jump out at me.   I want to know that it is correct having proven that it is correct such that I never have to look at it again.

    In closing, no, I did not exhaustively review the code as I would.   Instead, I saw enough of it to realize that it was not obvious, therefore it could not be obviously-correct, therefore it could be a future source of problems.   And so, that’s why I would instruct you to rewrite it and of course pay you to do so.

      sundialsvc4, thank you very much for your review. I don't take anything you said personally at all. I am always happy to get critique on my work, good or bad.

      In regards to your comments... I am not on a development team or producing enterprise level code in any way. Most of my scripts are used locally by my team in a very specific one-off scenario. In this case, we were using it to read the output of a tshark command that was always formatted the same way. Had this been something to use repeatedly, there would have been differences... among them fully commented code, sensible variable names (like $currentLine instead of $cl), etc.

      That said, I am also not a professional programmer, CS guy, or Perl expert - hence the original post looking for help. Coming into this script, I knew mostly *what* I wanted to do... but the *how* of moving around in a text file back and forth was new to me. So I Googled, read O'Reilly materials, and experimented until I got something that worked. What you don't see here is the 20 lines of "DEBUG: print..." I took back out! (( grin ))

      I would be very interested in how a programmer and Perl expert would tackle this. Could I ask you, if you have time, to re-do the algorithm in a "good" way? It would be purely for education...

      By the way, I think you're awesome for taking the time to look at my code and provide feedback!!

      Cheers!!

      ImJustAFriend