Re: Efficient Log File Parsing with Regular Expressions

This runs a little faster than yours, but you might want to keep trying variations using the Benchmark module. The only real difference between mine and yours is that I am assuming that the start of the $str_40_chars1 always begins with something other than a digit (0-9) and that everything before it is either a digit or a space (as your test data seems to indicate). Anyhow here is the benchmarking that I did:

use strict;
use Benchmark qw/cmpthese/;
my $count = 50000;


my $regex2 = qr/^-([\d ]+)(.{40})(.{40})(.+)/;
my $regex1 = qr/^-((\S+\s+){18})(.{40})(.{40})(.+)/;

my @test_data;

while ( <DATA> )
{
   push @test_data, $_;
}


cmpthese($count, {
  'enlil'   => sub {

      foreach my $line (@test_data) {

            if ($line =~ /^-3/) {

                    $line =~ m/$regex2/;

                    my $str_18_fields        = $1;
                    my $str_40_chars1        = $2;
                    my $str_40_chars2        = $3;
                    my $str_remain            = $4;

                    $str_40_chars1 =~ s!\|!_!;
                    $str_40_chars2 =~ s!\|!_!;


                    #print "\(ENLIL)str_18_fields    = $str_18_fields\
+n";
                    #print "\(ENLIL)str_40_chars1    = $str_40_chars1\
+n";
                    #print "\(ENLIL)str_40_chars2    = $str_40_chars2\
+n";
                    #print "\(ENLIL)str_remain        = $str_remain\n\
+n";
            }

      }
  },

  'hackdaddy'   => sub {

     foreach my $line (@test_data) {

          if ($line =~ /^-3/) {

                  $line =~ m/$regex1/;

                  my $str_18_fields        = $1;
                  my $str_40_chars1        = $3;
                  my $str_40_chars2        = $4;
                  my $str_remain            = $5;

                  $str_40_chars1 =~ s!\|!_!;
                  $str_40_chars2 =~ s!\|!_!;


                  #print "\(hackdaddy)str_18_fields    = $str_18_field
+s\n";
                  #print "\(hackdaddy)str_40_chars1    = $str_40_chars
+1\n";
                  #print "\(hackdaddy)str_40_chars2    = $str_40_chars
+2\n";
                  #print "\(hackdaddy)str_remain        = $str_remain\
+n\n";
          }
     }
  },

 });
[download]

If nothing else you can keep adding subs for further benchmarking.

-enlil

Comment on Re: Efficient Log File Parsing with Regular Expressions Select or Download Code