comment on

Warning in advance: this is a somewhat long write-up... readmore has been utilized.

I have lines of text that look similar to this:

1 60 1.2.3.4 -> 4.3.2.1 TCP 3456 > 80 [SYN]
2 60 1.2.3.4 -> 4.3.2.1 TCP 3456 > 113 [SYN]
3 60 1.2.3.4 -> 4.3.2.1 TCP 3456 > 123 [SYN]
4 120 2.3.4.5 -> 5.4.3.2 ICMP ? > ? echo (ping) reply
5 120 2.3.4.5 -> 5.4.3.2 ICMP ? > ? echo (ping) request
6 120 2.3.4.5 -> 5.4.3.2 ICMP ? > ? echo (ping) reply
7 60 1.2.3.4 -> 4.3.2.1 TCP 3456 > 562 [RST]
8 60 1.2.3.4 -> 4.3.2.1 TCP 3456 > 36 [RST]
9 60 1.2.3.4 -> 4.3.2.1 TCP 3456 > 90 [RST]
[download]

For interested parties, the data comes from Ethereal packet capture frames, I extract the data from individual packets and create a summary report resembling the above, which is then filtered through an algorithm to detect incoming attacks based on packet signatures and thresholds (read as: IDS).

For these lines, a base regexp can be applied:
/\d+\s+(\d+)\s+(SOURCE_ADDR)\s+->\s+(DEST_ADDR)\s+(PROT)\s+(SOURCE_PORT)\s+>\s+(DEST_PORT)/

I want to be able to write a CGI script that will allow me to substitute out the capitalized pieces in the above regexp for user-inputted values (or matching non-whitespace if a value is omitted) and then return the lines that match the generated regexp

For example:

user input:
SOURCE_ADDR=1.2.3.4 PROT=TCP

would generate the regexp:
/\d+\s+(\d+)\s+(1.2.3.4)\s+->\s+(\S+)\s+(TCP)\s+(\S+)\s+>\s+(\S+)/

and would match lines 1-3,7-9 of the above data set.

However, I need a way to generate regexp's that will not match certain values, such as the following:

user input: PROT=!TCP
would match lines 4-6 above.

user input: PORT=!80
would match lines 2-9 above.

I'd like to be able to do this without using several if statements, but the perl negative regexp operators are look-behind/look-ahead, I need an operator that will "match anything not equal to."

Does such an operator exist? Is there some combination of look-ahead/look-behind that I could use to do what I want?

Here's the current code:

#!/usr/bin/perl -w

use strict;
use CGI;

use vars qw($data_file);
$data_file = 'data.txt';

{
    my $cgi = new CGI;
    my $custom_regexp = '\d+\s+(\d+)\s+(SOURCE_ADDR)\s+->\s+(DEST_ADDR
+)\s+(PROT)\s+(SOURCE_PORT)\s+>\s+(DEST_PORT)';
    my %user_param;
    
    $user_param{'dest_addr'}   = defined($cgi->param('dest_addr')) ? $
+cgi->param('dest_addr') : '\S+';
    $user_param{'source_addr'} = defined($cgi->param('source_addr')) ?
+ $cgi->param('source_addr') : '\S+';
    $user_param{'prot'}        = defined($cgi->param('prot')) ? $cgi->
+param('prot') : '\S+';
    $user_param{'source_port'} = defined($cgi->param('source_port')) ?
+ $cgi->param('source_port') : '\S+';
    $user_param{'dest_port'}   = defined($cgi->param('dest_port')) ? $
+cgi->param('dest_port') : '\S+';
        
    my $new_sig = $custom_regexp;
    
    foreach my $key (keys %user_param) { 
        if($user_param{$key} =~ /^!(.+?)$/) {
            $user_param{$key} = "?!$1)(\\S+)(?<!$1";
        }
    }
        
    $new_sig =~ s/SOURCE_ADDR/$user_param{'source_addr'}/;
    $new_sig =~ s/DEST_ADDR/$user_param{'dest_addr'}/;
    $new_sig =~ s/PROT/$user_param{'prot'}/;
    $new_sig =~ s/SOURCE_PORT/$user_param{'source_port'}/;
    $new_sig =~ s/DEST_PORT/$user_param{'dest_port'}/;
        
    print "$new_sig\n";
    
    open(DATA, "<$data_file");
        while(my $pkt = <DATA>) { print "$pkt" if $pkt =~ qr/$new_sig/
+; }
    close DATA;
}
[download]

But (?!$1)(\S+)(?<!$1) matches only so long as what I'm matching contains no whitespace, if it does, the \S+ doesn't match.

This may seem unimportant in the present application, but eventually, I'd like to be able to add a TCP_TYPE param, and those can resemble:

[SYN] or [SYN, ACK] (other values may be present besides SYN and ACK)

So using [(?!SYN)(\S+)(?<!SYN)] would fail on a [SYN, ACK] packet

Any help would be greatly appreciated, thank you for taking the time to read all of this, and thank you in advance for any replies that help me along to my goal.

In reply to Runtime Regexp Generation by tekkie

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.