(jcwren) Re: I want to be a Perl Jedi
by jcwren (Prior) on Jul 18, 2000 at 17:23 UTC
|
First, allow me to complement you for reading the site FAQs, and using the <code></code> tags.
I'd probably start by creating an array to hold the values you're searching for, and using a for loop to check for each of those values. That way, adding new ports becomes pretty simple. It's not *the* most efficient way, but your script is hardly going to take a server to it's knees. And we're told that maintainability is worth more than a little performance hit.
For checking for file size, use the '-s' operator. This returns the size of the requested file. '-z' will tell you if the file is zero-length or not. 'perldoc -f -s' will tell you about these.
For mailing the results, check out the Mail::Mailer module. This will allow you to easily send mail. There are actually a couple of modules for sending e-mail. Some will argue that some are better than others, for whatever reasons. This one would be a good place to start, and you can poke around in the Mail:: and Net:: area on CPAN for other modules.
--Chris
e-mail jcwren | [reply] |
RE: I want to be a Perl Jedi
by ferrency (Deacon) on Jul 18, 2000 at 17:59 UTC
|
As jcwren said, putting the ports you're looking for in an array, and then looking for them all sequentially is a good start:
my @bad_ports = qw(12345 12346 20034 8787 31337 31338 54320 54321);
while(<FWLOG>){
foreach my $port (@bad_ports) {
if (/\b$port\b/) {
print;
last;
}
}
}
The problem with this code's performance is, each time through the foreach() loop, the regular expression is recompiled. One step better would be to build an alteration out of the array:
# I'd separate this out and put it in a configuration section of th
+e script
my $bad_ports = '12345|12346|20034|8787|31337|31338|54320|54321';
my $found = 0;
while(<FWLOG>){
print if (/\b($bad_ports)\b/o);
}
The /o on the regular expression means "even though there's a variable in there, it wont' change, so compile the regex only once."
In the spirit of "there's more than one way to do it," the last option I'll present is to use a hash. In this method, you do one regular expression match on your logfile line to extract the port (I don't know what your log lines look like, so you'll need to do that), and then you do a hash lookup to see if it's a bad port.
# Build the hash
my %bad_ports = (12345=>1, 12346=>1, 20034=>1, 8787=>1, 31337=>1); # s
+hort list
while (<FWLOG>) {
my $port = extract_port_from($_); # fill in a regex here
print if $bad_ports{$port};
}
I used a hash here, although in this case an array would work just as well since there are only numerical indices. I believe using an array would allocate space for 0..54321 however, which is not ideal.
I hope these techniques are useful to you! And a small warning: while I made my best effort to write working code, I didn't test it.
Alan
| [reply] [d/l] [select] |
|
|
Just because TIMTOWTDI, I put together a version using the qr// operator. I had guessed it would be more efficient than the /o version, but such is not the case:
use Benchmark;
push @lines, ((' ' x int rand 12) . (int rand 60000) . (' ' x int rand
+ 2)) for 1..10000;
@pattern_list = qw(12345 12346 20034 8787 31337 31338 54320 54321);
my $bad_ports = '\b(?:12345|12346|20034|8787|31337|31338|54320|54321)\
+b';
timethese( 100, {
'qr' => 'with_qr',
'/o' => 'with_o'
});
sub with_qr {
foreach $pattern (@pattern_list) {
my $re = qr/\b${pattern}\b/;
foreach $line (@lines) {
$line =~ /$re/;
}
}
}
sub with_o {
my $found = 0;
for(@lines){
(/\b(?:$bad_ports)\b/o);
}
}
with these results:Benchmark: timing 100 iterations of /o, qr...
/o: 7 wallclock secs ( 6.88 usr + 0.00 sys = 6.88 CPU) @ 14
+.53/s (n=100)
qr: 20 wallclock secs (18.84 usr + 0.00 sys = 18.84 CPU) @ 5
+.31/s (n=100)
Granted, this is a poor dataset, and mileage will probably vary for a real logfile. | [reply] [d/l] [select] |
RE: I want to be a Perl Jedi
by nuance (Hermit) on Jul 18, 2000 at 17:41 UTC
|
Three suggestions spring to mind for combining the print statements, you could either combine the tests into one regular expression:
print "$_" if /\b12345\b|
\b12346\b|
\b20034\b|
\b8787\b|
\b31337\b|
\b31338\b|
\b54320\b|
\b54321\b/x;
or combine the different regular expressions into one logical test:
print "$_" if (/\b12345\b/ or
/\b12346\b/ or
/\b20034\b/ or
/\b8787\b/ or
/\b31337\b/ or
/\b31338\b/ or
/\b54320\b/ or
/\b54321\b/);
or use a little loop:
my @ports = map { qr/$_/ } ("\b12345\b", "\b12346\b", "\b20034\b",
"\b8787\b", "\b31337\b", "\b31338\b",
"\b54320\b", "\b54321\b");
while (<FWLOG>) {
for my $port (@ports) {
if ($_ =~ $port) {
print;
last;
}
}
}
(This idea for the last one was pinched from jcwren, implementation pinched from various others).
Nuance
| [reply] [d/l] [select] |
|
|
if you're going to start combining regular expressions, you may as well condense them further.
/\b12345\b/ and /\b12346\b/
can become
/\b1234[56]\b/
and
/\b31337\b/ and /\b31338\b/
can become
/\b3133[78]\b/
and
/\b54320\b/ and /\b54321\b/
become
/\b5432[01]\b/
hope this helps some more.
Update (to jcwren): good point. i totally agree. i didn't realize this was port numbers and that they might change in the future.
| [reply] [d/l] [select] |
|
|
jlistf, I can't (and won't) disagree that the regexp optimizations you've offered do optimize the search. But this is really the kind of data that should not be optimized. Since ports can change, new scanners can come (and go), etc, optimizing the regexp seriously decreases the maintainability of the script. The next person that comes along may not be a Perl programmer.
To this end, the script should contain an array at the top, with the list of port numbers, along with explicit instructions as to how to add/delete/change a port number. The next person shouldn't have to learn regexp just to update the ports to be checked for.
So, you get a virtual ++ for reducing the regexp, but a virtual -- for making it more unmaintainable. (which really means I just don't vote on the node).
--Chris
e-mail jcwren
| [reply] |
RE: I want to be a Perl Jedi
by Adam (Vicar) on Jul 19, 2000 at 00:34 UTC
|
Everyone has posted good information about regular expressions, so I won't dwell on that. But I will point out that you don't need to use a shell command to get the time. Right now your code reads: chomp($DATE=`date +%d%b%y`);
Where the backticks quietly go off and spawn another process to get the date for you. Very slow in comparison to the Perl way: localtime(time); Seeing that you just want the day/month/year, you can do this by saying:
my @date = (localtime(time))[3..5]; # Get the date. (18, 6, 10
+0)
++$date[1] and $date[2] += 1900; # Make it reader friendly
for( @date[0..1] ){ $_ = "0$_" if 10 > $_ } # Add 0 prefixes as needed
+.
my $DATE = join '', @date; # now 18072000
As usuall, this can be made more tight and efficient, I'm not playing Perl Golf, just showing you don't need to make a shell call to get the date. For more info read perldoc:localtime. | [reply] [d/l] [select] |
Re: I want to be a Perl Jedi
by turnstep (Parson) on Jul 19, 2000 at 04:17 UTC
|
Here's one way to easily store and edit the numbers.
You could also store them in a separate file, but I
like the DATA trick better. Either way, the list can
easily be maintained by others who may not know
about regexp or perl code:
#!/bin/nsperl
use strict;
my $trojanfile = "trojan.log"; ## At the top makes it easier to find i
+f it is changed
## Read in all the port numbers
my %number;
while(<DATA>) {
/^#/ and next; ## Skip lines starting with '#'
/(\d+)/ and $number{$1}++; ## Hash eliminates doubles gracefully
}
my $findport;
for (sort {$a <=> $b} keys %number) { ## Sort makes it easier to read
$findport .= "$_|";
}
chop $findport; ## Yes, this is chop, not chomp!
open (FILEOUT, ">$trojanfile") or die "Can't open $trojanfile: $!\n";
print FILEOUT "Searched for: $findport\n";
## Slow but succinct:
(my $logfile=`date +%d%b%Y` . ".elog") =~ s/\n//;
open (FWLOG, "$logfile") or die "Can't open $logfile: $!\n";
while(<FWLOG>){
print FILEOUT if (/\b(?:$findport)\b/o); ## nice call ferrency
}
close(FWLOG);
exit;
__DATA__
## This is the list of ports
## Lines with a '#" at the start are comments
## Everything else is a port number to check for
## Even embedded comments are okay, just have the number go first
12345
12346
31337 ## Default back orifice port
8787 ## Other comments here
1823 This is a comment too, but without the nice '#'
88776
| [reply] [d/l] |
Re: I want to be a Perl Jedi (note)
by lindex (Friar) on Jul 19, 2000 at 12:24 UTC
|
chomp($DATE=`date +%d%b%y`);
I have found a better way todo this like:
use POSIX;
$DATE = strftime("%d%b%y",localtime());
throw back to good ol' C :)
/****************************/
jason@gost.net, wh@ckz.org
http://jason.gost.net
/*****************************/
| [reply] [d/l] [select] |