comment on

This runs a little faster than yours, but you might want to keep trying variations using the Benchmark module. The only real difference between mine and yours is that I am assuming that the start of the $str_40_chars1 always begins with something other than a digit (0-9) and that everything before it is either a digit or a space (as your test data seems to indicate). Anyhow here is the benchmarking that I did:

use strict;
use Benchmark qw/cmpthese/;
my $count = 50000;


my $regex2 = qr/^-([\d ]+)(.{40})(.{40})(.+)/;
my $regex1 = qr/^-((\S+\s+){18})(.{40})(.{40})(.+)/;

my @test_data;

while ( <DATA> )
{
   push @test_data, $_;
}


cmpthese($count, {
  'enlil'   => sub {

      foreach my $line (@test_data) {

            if ($line =~ /^-3/) {

                    $line =~ m/$regex2/;

                    my $str_18_fields        = $1;
                    my $str_40_chars1        = $2;
                    my $str_40_chars2        = $3;
                    my $str_remain            = $4;

                    $str_40_chars1 =~ s!\|!_!;
                    $str_40_chars2 =~ s!\|!_!;


                    #print "\(ENLIL)str_18_fields    = $str_18_fields\
+n";
                    #print "\(ENLIL)str_40_chars1    = $str_40_chars1\
+n";
                    #print "\(ENLIL)str_40_chars2    = $str_40_chars2\
+n";
                    #print "\(ENLIL)str_remain        = $str_remain\n\
+n";
            }

      }
  },

  'hackdaddy'   => sub {

     foreach my $line (@test_data) {

          if ($line =~ /^-3/) {

                  $line =~ m/$regex1/;

                  my $str_18_fields        = $1;
                  my $str_40_chars1        = $3;
                  my $str_40_chars2        = $4;
                  my $str_remain            = $5;

                  $str_40_chars1 =~ s!\|!_!;
                  $str_40_chars2 =~ s!\|!_!;


                  #print "\(hackdaddy)str_18_fields    = $str_18_field
+s\n";
                  #print "\(hackdaddy)str_40_chars1    = $str_40_chars
+1\n";
                  #print "\(hackdaddy)str_40_chars2    = $str_40_chars
+2\n";
                  #print "\(hackdaddy)str_remain        = $str_remain\
+n\n";
          }
     }
  },

 });
[download]

If nothing else you can keep adding subs for further benchmarking.

-enlil

In reply to Re: Efficient Log File Parsing with Regular Expressions by Enlil
in thread Efficient Log File Parsing with Regular Expressions by hackdaddy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.