monger has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monestary Dwellers, I just returned to work from a SANS/GIAC course on Intrusion Detection. In prep for the exam, I am looking to work on a new script for analysing firewall log files (unfortunately, GIAC dropped the practical, so I'm having to ad hoc prepare for the test) :-(. So, here's what I want to do:
my $regex = "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"; my @win_ports = (135, 137, 139, 445, 1025, 1433, 1434); my @trojan_ports = (113, 15118, 4899); my $file = "C:\\fw.log"; open LOG, "$file" || die "Can't open fwlog: $!"\n; while (<LOG>) { foreach $port(@win_ports) { if (/$regex\/$port/g) { print }
OK - that's roughly it. I'll have to work on the regex to carefully get just what I want. I'll also write in output files to mate with the arrays at the beginning. Now, questions:

1) What would be the fastest way to chunk through a file, looking for say, 10-20 ports? I could be dealing with files over 100MB, so I want to make sure it's optimized as much as possible.

2) I would like to print them to the file grouped by port, and from there, I can do some more analysis. Suggestions for capturing, for instance, all the matches for port 445 and then writing them to the $win_ports.txt file, then concatenating the matches for 135, etc?

Thanks, monger

Monger +++++++++++++++++++++++++ Munging Perl on the side

Replies are listed 'Best First'.
Re: Firewall Log Analysis - port matrices
by kvale (Monsignor) on Mar 17, 2005 at 19:43 UTC
    Note, your code as stated above, does not compile.

    You did not state the format of the log file, so I created my own :) Here is code to do roughly what you want:

    my @trojan_ports = (113, 15118, 4899); my %trojan; @trojan{@trojan_ports} = 1; my @win_ports = (135, 137, 139, 445, 1025, 1433, 1434); my %win; @win{@win_ports} = 1; my %port_ip; while (<DATA>) { chomp; my ($ip, $port) = split m"/"; push @{ $port_ip{$port} }, $ip; } foreach my $port (sort {$a <=> $b} keys %port_ip) { if (exists $win{$port}) { print "Windows port $port\n"; } elsif (exists $trojan{$port}) { print "Trojan port $port\n"; } else { print "Unknown port $port\n"; } print " $_\n" foreach @{ $port_ip{$port} }; } __DATA__ 1.2.3.4/135 1.2.3.5/135 1.2.3.6/135 1.2.3.4/137 1.2.3.7/137 1.2.3.9/113 1.2.3.10/111

    -Mark

Re: Firewall Log Analysis - port matrices
by jhourcle (Prior) on Mar 17, 2005 at 19:36 UTC

    It's hard to recommend a how to go through the file, without seeing sample input. You should probably also think about chosing a pattern so you don't need to regenerate multiple paterns each line, such as:

    my $ports = '(?:'.join('|', @win_ports).')'; while (<LOG>) { print if ( m#$regex/$ports#o ); }

    I also noticed you never did anything with @trojan_ports.

    Update: I realized that this also didn't deal with part #2 of your issue -- breaking things down by port. Assuming you had enough memory to deal with keeping the whole thing in memory, I'd probably push the records into arrays in memory, and then print them out when done, given the considerations. However, I still have no idea what the format of the log files is, and it's entirely possible that you may have multiple matches per line, if you have both the remote and local ip/port combination. I'd probably use logic similar to:

    my $atom = qr/[1,2]?\d{1,2}/; my @ports_win = qw( 135 137 139 445 1025 1433 1434 ); my @ports_trojan = qw( 114 15118 4899 ); my $file = 'C:\fw.log'; my $ports = '('.join('|',@ports_win, @ports_trojan).')'; my $ip = qr#$atom\.$atom\.$atom\.$atom/$ports#; my %lines = (); open LOG, '<', $file or die "Can't read from $file : $!"; while ( my $line = <LOG> ) { if ( $line =~ m/$ip/o ) { push ( @{$lines{$1}}, $line ); } } foreach my $port ( 'WINDOWS', @ports_win, 'TROJAN', @ports_trojan ) { print "\n\n$port\n-------\n",@{$lines{$port}||[]}; }
Re: Firewall Log Analysis - port matrices
by cazz (Pilgrim) on Mar 18, 2005 at 02:31 UTC
    That isn't very exact... how about:
    use Regexp::Common qw /net/; my %bad = (3128 => 1, 31337 => 1); my %win = (138 => 1, 139 => 1); while (<>) { /$RE{net}{IPv4}\/(\d{1,5})/ && ($bad{$1} || $win{$1}) && print; }