Re: Pattern matching speed

There are several things that would probably help to improve the performance of you script.

Rather than using system to start a shell to do things like creating directories and deleting files, you should use the builtin functions: mkdir and unlink. You could also (probably) save shelling out twice when you zip the data and then move it, by combining the two command into one using the '&&' conditional command seperator. Ie.

system('pkzip25 ..... && move .... \zip');

If the data being zipped is produced by the other perl script you are running, then you might look at using one of the zip modules from cpan to write directly into the zip.

In the first while loop, you split $currentLine into the global array @tokens. You then decide whether to store this data in to your hash, the re-assign the first @tokens1..9 (one at a time) into a myd array @temp, so that you can assign a reference to that array in your hash. That whole process is much more easily acheived using an array slice and an anonymous array

$reports{$tokens[0]} = [ @tokens[1..9] ];

Probably another micro-optimisation, but easier to type and read if nothing else.

It generally makes sense to use lexical my variables rather than globals. Firstly, these tend to be somewhat quicker to access than globals. Its a micro-optimisation, but worth having if you need better performance.

The other benefit of doing so it that it would allow you to use strict, which would have caught your typo on line 173.

elsif ($repDay > $mday && $repMon == $mon && repYear == $year)

Did you spot it yet? Perl did once strict was in force. You omitted the sigil ($) on repYear. It may not break the code but it would be better if it was corrected.

In your inner-most (and therefore most repeated) loop, you are setting a flag if you get a match. You also test that variable on each subsequent iteration and don't attempt further matches if you've already found one, but you still iterate through all of @comparisons. You can easily short-circuit that by using last to exit the loop as soon as you find a match.

It's quite likely, but there is not enough information on what is in your @comparisons array, that you could join your search terms together and perform the match in a single regex (used twice), rather than looping over them.

Something like

@comparisons = map{ quotemeta } @comparisons;
my $re_comps = '(?:' . join( '|', @comparisons ) . ');
$re_comps = qr[$re_comps];
[download]

I haven't done this as the best way of doing will depend on the nature of the contents of that array. Eg. If they can be regexes themselves, then you would need to treat them differently.

You (and several other people:) will probably hate what I've done to your script. In particular, sorry that the comments got 'lost' during refactoring. It should at least do the same as your original, and with a bit of luck might run a little quicker. It might give you a starting place to go forward from.

It would very much be worth using Devel::SmallProf or similar to profile the program to find out where the time is being spent, before making further optimisations, or even accepting those I've offered.

#! perl -w
use strict;
use Cwd;

my @logs = <@ARGV>;
my %reports=();
my %results=();

my ($mday,$mon,$year) = (localtime(time))[3 .. 5];
$year = $year +1900;
$mon = $mon +1;

my $path = cwd;
open INPUT_IN, "bat/FindSitesInput.txt"
    or die "Can't open input file! :$!";

while( <INPUT_IN> ) {
    next if  m/^\#/;
    my @tokens = split("\t");

    if( !isExpired( $tokens[5] ) ) {
        $reports{$tokens[0]}= [ @tokens[1..9] ];
        $results{$tokens[0]}= [ ];
    }
}

close INPUT_IN or die "Can't close input file: $!";

foreach my $logfile ( @logs ) {
    print "\nSearching $logfile\n";
    open LOG_IN, $logfile
        or die "Can't open $logfile! :$!";

    while( my $currentLine = <LOG_IN> ) {

        my @tokens = split ' ', $currentLine;

        foreach my $rep (keys %reports) {

            my @comparisons = split( ' ', $reports{$rep}[0] );
            my $match = 0;
            foreach my $item (@comparisons) {

                if( $reports{$rep}[1] eq 'site' ) {
                    $match = 1 if $tokens[6] =~ /$item/;
                }
                elsif( $reports{$rep}[1] eq 'ip' ) {
                    $match = 1 if $tokens[2] =~ /$item/;
                }
                last if $match;
            }
            if( $match && $reports{$rep}[2] eq 'normal' ) {
                push @{$results{$rep}}, $currentLine;
            }
            elsif( !$match && $reports{$rep}[2] eq 'reverse' ) {
                push @{$results{$rep}}, $currentLine;
            }
        }
    }
    close (LOG_IN) || die "Can't close log file: $!";

    foreach my $rep (keys %results) {
        mkdir $reports{$rep}[5] if not! -d $reports{$rep}[5];

        open OUTPUT, ">$reports{$rep}[5]/$logfile-$rep"
            or die "Can't open $reports{$rep}[5]/$logfile-$rep! :$!";

        print OUTPUT @{$results{$rep}};
        close OUTPUT or die "Can't close output file: $!";
        system( "c:/blat/blat \"$path/$reports{$rep}[5]/$logfile-$rep\
+" -t \"$reports{$rep}[3]\"" )
            unless $reports{$rep}[3] eq 'none';

        if(($reports{$rep}[6] eq 'true')) {
            system( "perl bat/ipandbytes.pl $reports{$rep}[5]/$logfile
+-$rep" );
            unlink "$reports{$rep}[5]\\$logfile-$rep"
                if $reports{$rep}[7] eq 'false';
        }
        if( $reports{$rep}[8] eq 'true' ) {
            system( "pkzip25 -add -max $reports{$rep}[5]/$logfile-$rep
+" );
            system( "move $reports{$rep}[5]/$logfile-$rep.zip /zip" );
        }
    }
}

sub isExpired {
    my $date = shift;
    chomp $date;
    return 0 if $date =~ /none/;

    my( $repMon, $repDay, $repYear ) = $date =~ /([0-9]+)\/([0-9]+)\/(
+[0-9][0-9][0-9][0-9])/;

    return 0 if $repYear > $year
             or $repMon  > $mon && $repYear == $year
             or $repDay  > $mday && $repMon == $mon && $repYear == $ye
+ar;
    return 1;
}
[download]

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Comment on Re: Pattern matching speed Select or Download Code