Another Array Problem: comparing.

dru145 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

Ok, I'm stuck. I have a log file in this format:

1;30Nov2001;17:08:25;192.168.1.2;log;accept;;hme0;outbound;udp;192.168
+.86.6;20.248.36.99;domain-udp;1103;63;85;;;;;;;;;;;;;;;
2;30Nov2001;17:08:25;192.168.1.2;log;drop;;hme0;inbound;tcp;63.28.96.2
+54;192.168.11.67;netbios-ssn;18803;48;89;;;;;;;;;;;;;;;
3;30Nov2001;17:08:26;192.168.1.2;log;drop;;hme0;inbound;tcp;65.93.20.2
+23;192.168.26.139;auth;1323;60;89;;;;;;;;;;;;;;;
4;30Nov2001;17:08:26;192.168.1.2;log;drop;;hme0;inbound;tcp;65.93.22.2
+23;192.168.26.139;auth;1323;60;89;;;;;;;;;;;;;;;
5;30Nov2001;17:08:26;192.168.1.2;log;accept;;qfe2;inbound;tcp;192.168.
+86.146;20.248.36.97;http;4719;44;85;;;;;;;;;;;;;;;
6;30Nov2001;17:08:26;192.168.1.2;log;accept;;hme0;outbound;tcp;192.168
+.86.146;204.48.36.97;http;4719;44;85;;;;;;;;;;;;;;;
7;30Nov2001;17:08:26;192.168.1.2;log;accept;;qfe2;inbound;tcp;192.168.
+86.146;204.48.36.97;http;4721;44;85;;;;;;;;;;;;;;;
8;30Nov2001;17:08:26;192.168.1.2;log;accept;;hme0;outbound;tcp;192.168
+.86.146;24.248.36.97;http;4721;44;85;;;;;;;;;;;;;;;
8;30Nov2001;17:08:26;192.168.1.2;log;accept;;hme0;outbound;tcp;192.168
+.86.146;20.248.36.97;http;4721;44;85;;;;;;;;;;;;;;;
9;30Nov2001;17:08:26;192.168.1.2;log;accept;;qfe2;inbound;tcp;192.168.
+27.154;205.18.145.185;http;4396;44;85;;;;;;;;;;;;;;;
10;30Nov2001;17:08:26;192.168.1.2;log;accept;;hme0;outbound;tcp;192.16
+8.27.154;25.188.145.185;http;4396;44;85;;;;;;;;;;;;;;;
11;30Nov2001;17:08:26;192.168.1.2;log;accept;;qfe2;inbound;tcp;192.168
+.27.154;205.88.145.185;http;4397;44;85;;;;;;;;;;;;;;;
12;30Nov2001;17:08:26;192.168.1.2;log;accept;;hme0;outbound;tcp;192.16
+8.27.154;205.188.45.185;http;4397;44;85;;;;;;;;;;;;;;;
[download]

And here is the code I have written so far:

#!/usr/bin/perl -w
use strict;
 
 
my $log = './log';
my @data;
 
# Open the firewall log file and create new array containing all of th
+e data.
 
open (LOG, $log) or die "Can't open $log: $!";
while (<LOG>){
  push (@data, "$_");
}
 
# Split the @data array into separate arrays by category.
 
my (@dst, @service);
 
foreach (@data) {
  my @lines=split "\n",$_;
   foreach(@lines){
    my ($num,$date,$time,$fw,$type,$action,$alert,$int,$dir,$proto,$sr
+c,$dst,$service,$sport,$len,$rule) = (split /;/,$_);
    push(@dst, $dst); 
    push(@service, $service); 
  }
}
[download]

This seems to work fine, but what I need to do now is compare the @dst and @service arrays and if the @dst array has the same ip AND the @service array has the same service for at least 50 log entries, then I want to execute a sub, but I can't think of how to do this.

Any suggestions?

TIA

-Dru

Edit kudra, 2001-12-22 Appended to title

Comment on Another Array Problem: comparing. Select or Download Code

Replies are listed 'Best First'.
Re: Another Array Problem. by TomK32 (Monk) on Dec 19, 2001 at 01:57 UTC
I've got not a clean solution but it works: replace `push(@dst, $dst); push(@service, $service);` [download] with `@data{$dst}{$service}++; if ($data{$dst}{$service} == 50) { do magic; }` [download] `-- package Lizard::King; sub can { do { 'anything'} };` [download]	[reply] [d/l] [select]
(Ovid) Re: Another Array Problem. by Ovid (Cardinal) on Dec 19, 2001 at 02:01 UTC
This isn't tested, but I cleaned up your code and added a couple of things to do what I think you need: #!/usr/bin/perl -w use strict; my $log = './log'; # Open the firewall log file and create new array containing all of th +e data. open LOG, "<", $log or die "Can't open $log: $!"; my @data = <LOG>; close LOG; # Split the @data array into separate arrays by category. my (@dst, @service); my $dup_count = 0; my $last_dst = ''; my $last_service = ''; foreach (@data) { my @lines=split "\n",$_; foreach(@lines){ my ($dst,$service) = (split /;/,$_)[11,12]; push(@dst, $dst); push(@service, $service); if ( $dst eq $last_dst and $service eq $last_service ) { $dup_count++; # you probably want to clean up dup_count here to avoid func # being called for dup 51, dup 52, etc &some_func if $dup_count >= 50; } else { # didn't match, so we reset; $dup_count = 0; } $last_dst = $dst; $last_service = $service; } } [download] Cheers, Ovid Update: Okay, I think I am confused about the specification. I thought we were looking for 50 identical dsts and services in a row. Reading the question closer, it appears to be a has related issue, in which case, TomK32 gave a good answer. Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l]
Re: Another Array Problem. by aijin (Monk) on Dec 19, 2001 at 02:03 UTC
There are a couple of things that aren't clear to me here. First, why read the file into an array and then go through the array and split the lines. Are you sure any of the lines are splitting? When you `while(<LOG>)` you are reading the file line by line, which makes the split you've got in the foreach loop unnecessary. Secondly, it's not clear what exactly you're checking, so I'm going to make the following assumptions. Please correct me if I'm wrong. 1. You want to check if all the entries in the @dst array are the same IP. 2. You want to know when there are 50+ of any service in the @service array. Both of these tasks can be solved with the use of a hash. I suggest you meander over to the Categorized Questions and Answers section and read up on them. -a.	[reply] [d/l]
Re: Re: Another Array Problem. by dru145 (Friar) on Dec 19, 2001 at 20:23 UTC
aijin, You are correct, I didn't need the first array (@data) nor did I need to split on a new line. Thanks for the hash tip. I figured that's what I needed, but I don't have much experience with hashes, so I took this time to learn. I think I'm finally grasping them. Here is the code I came up with (suggestions appreciated): #!/usr/bin/perl -w use strict; + my $log = './log'; my (%count, %hash); open (LOG, $log) or die "Can't open $log: $!"; while (<LOG>){ foreach($_){ my ($num,$date,$time,$fw,$type,$action,$alert,$int,$dir,$proto,$src +,$dst,$service,$sport,$len,$rule) = (split /;/,$_); %hash = (dest => $dst, service => $service); foreach my $key (keys %hash){ my $val = $hash{$key}; $count{$val}++; } #close foreach } #close foreach }#close while foreach my $key1 (keys %count){ print "$key1 appears $count{$key1} times\n"; } #close foreach [download] I still need something that will run a sub if both the destination ip AND service appears AT LEAST 50 times in the log files, but I think this will be fairly easy. Thanks again, Dru	[reply] [d/l]
Re: Re: Re: Another Array Problem. by Juerd (Abbot) on Dec 19, 2001 at 20:49 UTC
Suggestions welcome? Here they come :) `foreach($_){` [download] foreach($_) is kind of useless, you can safely remove it (and its closing bracket, of course). `my ($num,$date,$time,$fw,$type,$action,$alert,$int,$dir,$proto,$src +,$dst,$service,$sport,$len,$rule) = (split /;/,$_);` [download] You don't have to name everything. Instead, you can assign to undef if you don't need a specific value. `my (undef, undef, undef, undef, undef, undef, undef, undef, undef, +undef, undef, $dst, $service, undef, undef, undef) = split /;/; # spl +it() works on $_ if only one argument is given.` [download] Because there are more undefs than used values, a list slice would be even better: `my ($dst, $service) = (split /;/)[11, 12];` [download] `%hash = (dest => $dst, service => $service); foreach my $key (keys %hash){ my $val = $hash{$key}; $count{$val}++; } #close foreach } #close while` [download] There's no need to use these temporary variables %hash and $val; Well indented code doesn't need "#close foreach" comments (unless it's a huge sub, but in that case the design was probably wrong anyway). Because only the values of the hash are used and they're set within the same scope, there's no need for the hash at all. I'll also use the for-modifier (for equals foreach, but is shorter) to demonstrate perl's nice syntactic features. `$count{$_}++ for $dst, $service; }` [download] `foreach my $key1 (keys %count){ print "$key1 appears $count{$key1} times\n"; } #close foreach` [download] This can be done using map, but it might be confusing if you don't know how it works: `print map "$_ appears $count{$_} times\n", keys %count;` [download] Please also note I have a whitespace after every comma, which in my opinion makes the source more readable. I hope this was useful to you As a whole: `#!/usr/bin/perl -w use strict; + my $log = './log'; my %count; open (LOG, $log) or die "Can't open $log: $!"; while (<LOG>){ my ($dst, $service) = (split /;/)[11, 12]; $count{$_}++ for $dst, $service; # Now I see it this way, I realise that # $count{$_}++ for (split /;/)[11, 12]; # would be even better :) } print map "$_ appears $count{$_} times\n", keys %count;` [download] `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l] [select]


Perl: the Markov chain saw
	PerlMonks