Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
etc. I figured that since the bat file is essentially reading these files in 5 times, I can write a perl script that reads the file once and does all 5 comparisons on each line. Since I'd only be reading the files 1 time, it should be more efficient. Imagine my surprise after writing the script only to find that the Perl version takes much longer (2x) as long to complete as the Bat file. Can anyone shed some light on this? In particularly, I'm interested if there's a more efficient way to do pattern matching. Here's the script...for %%f in (htt*) do findstr "mail.yahoo aolmail hotmail" %%f > %%f.ma +il for %%f in (htt*) do findstr "some other string" %%f > %%f.mail
use Cwd; @logs = <@ARGV>; %reports=(); %results=(); #first, read in the input file to populate our data structure #the structure is a hash of arrays where the hash key is the name for +the output file #the first element of the array is the string we're looking for, the s +econd is the email, #address, and the third is the expiration date #get the current date for use in the upcoming loop ($sec,$min,$hr,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); + #apply the offsets to the year and month so we can do a straight compa +rison $year = $year +1900; $mon = $mon +1; $path = cwd; open (INPUT_IN, "bat/FindSitesInput.txt")|| die "Can't open input file +! :$!"; while (defined($currentLine = <INPUT_IN>)) { $_=$currentLine; if(! m/^\#/) { #split currentLine into tab-delimited tokens @tokens = split("\t",$currentLine); #only create an entry if the report is still valid if(!isExpired($tokens[5])) { #since we're using references, must declare temp as "my" t +o ensure we're creating #an object that is local to this block of code. my @temp = ($tokens[1], $tokens[2], $tokens[3], $tokens[4] +, $tokens[5], $tokens[6], $tokens[7], $tokens[8], $tokens[9]); #can only store a reference to an array in the hash (not t +he array itself) $reports{$tokens[0]}=\@temp; #initialize the results structure here my @data=(); $results{$tokens[0]}=\@data; } } } close (INPUT_IN) || die "Can't close input file: $!"; #now run the reports on each log file foreach $logfile(@logs) { print "\nSearching $logfile\n"; open (LOG_IN, "$logfile")|| die "Can't open $logfile! :$!"; while (defined($currentLine = <LOG_IN>)) { #Only want to search the files once (obviously) #therefore must apply all the tests in the "reports" datastruc +ture #to each line in the file. #logs are space delimited @tokens = split(" ",$currentLine); foreach $rep (keys %reports) { @comparisons = split(" ", $reports{$rep}[0]); $match = 0; foreach $item (@comparisons) { if($reports{$rep}[1] eq 'site' && !$match) { $_=$tokens[6]; if(/$item/) { $match = 1; } } elsif ($reports{$rep}[1] eq 'ip' && !$match) { $_=$tokens[2]; if(/$item/) { $match = 1; } } } if($match && $reports{$rep}[2] eq 'normal') { push @{$results{$rep}}, $currentLine; } elsif(!$match && $reports{$rep}[2] eq 'reverse') { push @{$results{$rep}}, $currentLine; } } } close (LOG_IN) || die "Can't close log file: $!"; #now write the output files for each report for this log foreach $rep (keys %results) { if(! (-d $reports{$rep}[5])) { system("mkdir $reports{$rep}[5]"); } open (OUTPUT, ">$reports{$rep}[5]/$logfile-$rep")|| die "Can't + open $reports{$rep}[5]/$logfile-$rep! :$!"; print OUTPUT @{$results{$rep}}; if(close OUTPUT) { if(!($reports{$rep}[3] eq 'none')) { system("c:/blat/blat \"$path/$reports{$rep}[5]/$logfil +e-$rep\" -t \"$reports{$rep}[3]\""); } if(($reports{$rep}[6] eq 'true')) { #call IPandBytes.pl system("perl bat/ipandbytes.pl $reports{$rep}[5]/$logf +ile-$rep"); #remove file if desired if(($reports{$rep}[7] eq 'false')) { system("del $reports{$rep}[5]\\$logfile-$rep"); } } if(($reports{$rep}[8] eq 'true')) { system("pkzip25 -add -max $reports{$rep}[5]/$logfile-$ +rep"); system("move $reports{$rep}[5]/$logfile-$rep.zip /zip" +); } } else { die "Can't close output file: $!"; } } } #this fucntion takes 1 argument (a date string of the form "MM/DD/YYYY +") #$year, $mday and $mon must be initialized prior to calling this funct +ion #isExpired returns true if the argument is chronologically after the d +ate #represented by $mon, $year and $mday otherwise it returns false sub isExpired { $_= pop @_; chomp; $temp = "none"; if(/$temp/) { return 0; } /([0-9]+)\/([0-9]+)\/([0-9][0-9][0-9][0-9])/; $repMon = $1; $repDay = $2; $repYear = $3; if($repYear > $year) { return 0; } elsif ($repMon > $mon && $repYear == $year) { return 0; } elsif ($repDay > $mday && $repMon == $mon && repYear == $year) { return 0; } else { return 1; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Pattern matching speed
by chromatic (Archbishop) on Jun 10, 2003 at 19:25 UTC | |
|
Re: Pattern matching speed
by BrowserUk (Patriarch) on Jun 10, 2003 at 23:30 UTC |