Greetings fellow monks
I have a problem of efficiency in a little tool I'm writing.
Essentially I have two arrays, one containing lines from a logfile, as produced by syslog, and one containing regular expressions.
I wish to remove from the loglines array all lines which match any of the regular expressions in my list.
My current code looks like this:
my @REGEXPS = ( "^\w{3} [ :0-9]{11} [._[:alnum:]-]+ pppd\[[0-9]+\]: (sent|rcvd) \[L +CP EchoReq id=[[:alnum:]]+ magic=[ [:alnum:]]+\]$", "^\w{3} [ :0-9]{11} [._[:alnum:]-]+ ssh\(pam_[[:alnum:]]+\)\[[0-9]+ +\]: session opened for user [[:alnum:]-]+ by \(uid=[0-9]+\)$", ); my @KEEP = (); foreach my $line ( @SYSLOG ) { my $match = 0; foreach my $r ( @REGEXPS ) { if ( $line =~ /$r/) { $match = 1; } } if (! $match ) { push @KEEP, $line; } }
This is slow, because I'm testing each regular expression against each line, making the number of matchs N x N.
What are my alternatives?
I've considered moving the lines into a hash instead, and using delete upon them - but I'm not sure how much of a gain this would be.
I can't help thinking I should be using map, or grep for this - but I'm not entirely sure how to do this.
(Yes this is replicating the functionality of the logcheck tool - it's a prototype rewrite in perl I'm producing).
Steve| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |