in reply to Splitting Apache Log Files

It looks like you are recompiling each regular expression once for each line of input to be scanned. Compiling regular expressions can be expensive, so performance might be improved if you pre-compile all the regular expressions.

http://www.stonehenge.com/merlyn/UnixReview/col28.html provides a nice introduction to some of the options for compiling regular expressions.

Replies are listed 'Best First'.
Re^2: Splitting Apache Log Files
by cmm7825 (Novice) on Apr 26, 2010 at 18:19 UTC
    Thanks for the link. When I read the the words I use the qr// operator. It was my understanding that this compiles the regular expression.

      qr// is a regular expression quote, and as such does, in a sense, compile regular expressions. Unfortunately, you're using the regular expression as a hash key, at which point it's turned back into a string. As you process the Apache log file, $rule is just a string. When you use it as a regular expression, it has to be compiled again - each time through the loop.

      If I were writing your code, I would store the regular expression rules/filehandles in an array. Here's a sketch of what it might look like:

      my @rules; # not %rules. ... # Process input file of processing rules while(<INFILE>) { ... push @rules, { regex => qr/$string/, file_handle => $fh }; } ... # Read Apache log file and print to various other files while (my $line = <STDIN>) { for my $rule_ref (@rules){ my $regex = $rule_ref->{regex}; my $fh = $rule_ref->{file_handle}; if ($line =~ $regex) { print $fh $line; } } }

      Hope this helps.

        WOW, never knew that. Thanks a lot the execution time went done to 22seconds.