Algorithmic difficulties

Guildenstern has asked for the wisdom of the Perl Monks concerning the following question:

I'm using Perl to write a script that will parse the log(s) of some intrusion detection software that we are runnnig. Since the software runs the netrok cards in promiscuous, there are a lot of entries in the logs that don't pertain to our box. Luckily, the software explicitly logs source and destination IP addresses. Unluckily, it only does it for connections. Raw ASCII data captures do not include full information. Here is a snippet from the log (various things have been removed for clarity):

*snip* incoming connection from=(111.111.111.111:1234) to=(22.22.22.22
+:80)
*snip* ASCII data in TCP packet from=(111.111.111.111:1234), localport
+=(80), *data here*
[download]

What I planned to do was parse the IP:port pairs for connections, and place the data for connections not to our box into a hash. Then I would use the information in the hash as an "ignore" list. The only problem I can see is that I have no way of knowing when a certain connection has completed and I can safely remove the entry from the hash.
To be more explicit - if the log were to look like this:

*snip* incoming connection from=(111.111.111.111:1234) to=(22.22.22.22
+:80)
*snip* ASCII data in TCP packet from=(111.111.111.111:1234), localport
+=(80), *data here*
*snip* incoming connection from=(111.111.111.111:1234) to=(my.box.ip.a
+ddr:80)
*snip* ASCII data in TCP packet from=(111.111.111.111:1234), localport
+=(80), *data here*
[download]

I would ignore the connection from 111.111.111.111:1234 to 22.22.22.22:80 since that connection is not destined for my box. But, since the ASCII capture only catches the destination port, the second captured ASCII packet would be discarded since the source IP is listed as being ignored.

My question (finally!) is this: how do I construct the hash and populate and remove entries so that I can accurately reflect connections that I can ignore? Is this even something that can be done easily? Does my question even make sense?

Guildenstern
Negaterd character class uber alles!

Comment on Algorithmic difficulties Select or Download Code

Replies are listed 'Best First'.
Re: Algorithmic difficulties by jeroenes (Priest) on Dec 21, 2000 at 03:52 UTC
The solution lies in the following: You can never have more than one connection to one port. The sniffer has left some parts out, because the connections are not really to the :80 port, but dispatched to another, higher number. However, the from port is such a high number, so we can assume that's the real (dispatched) port. Take your example: `snip from=(111.111.111.111:1234) to=(22.22.22.22:80) snip ASCII ... from=(111.111.111.111:1234), localport=(80), data he +re snip from=(111.111.111.111:1234) to=(my.box.ip.addr:80) snip ASCII ... from=(111.111.111.111:1234), localport=(80), data he +re` [download] Because there can only be one connection from the :1234 port, the connection to 22.22.22.22:80 must be broken before the connection to my.box.ip.addr:80 has been made. On basis of that you get: `$_ = <INPUT>; m/.?$([\d\.\:])$.?$([\d\.\:])$(.)/; ( $2 =~ m/:/ ) && ( $mine{$1} = ($2 eq $mybox) ) ; ( $2 =~ m/^\d+$/ ) && $mine{$1} && print $3;` [download] Hope this helps, Jeroen I was dreaming of guitarnotes that would irritate an executive kind of guy (FZ)* Update: chipmunk pointed to some typos. Thanx! Furthermore, I rewrote line 3, it's cleaner now.	[reply] [d/l] [select]
Re: Algorithmic difficulties by chipmunk (Parson) on Dec 21, 2000 at 02:56 UTC
I think I would do something like this: `my $my_addr = 'my.box.ip.addr:80'; my $keep = 0; while (<>) { if (/incoming connection/) { $keep = /\Q$my_addr/; } print if $keep; }` [download] That will go through the log and print out just the lines you're interested in. For each incoming connection, it will print all the lines up to the next incoming connection, only if the connection line contains your IP address. (Hopefully, the ASCII data for one connection will end before a new connection starts, otherwise you're kinda outta luck...)	[reply] [d/l]
(tye)Re: Algorithmic difficulties by tye (Sage) on Dec 21, 2000 at 02:55 UTC
It looks to me like there is no perfect answer here. In your second example, couldn't the second ASCII packet have been either one you wanted to ignore or not? It could have been the second packet for the connection to 22.22.22.22 or the first packet to my.box.ip.addr. You could certainly delete old connections when you get a new connection to the same "local" port number that you don't want to ignore. - tye (but my friends call me "Tye")	[reply]