comment on

Your regex is really horribly inefficient. Think about what it has to do to match that pattern. A much simpler way to do the same thing is:

#!/usr/local/bin/perl

use warnings;
use strict;
use English;
use Data::Dumper;
use Time::HiRes 'time';

my $logfile;

my $count = 0;                  # Initialize counter
my $start = time();             # Start the timer

my $fullrec;

while( <DATA>) {
    if (/^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d:\d\d\d/) {
        $count++;   
        #if ($fullrec) {
        #   process($fullrec);
        #}
        $fullrec = $_;
    } else {
        $fullrec .= $_;
    }
}

#if (defined($fullrec)) {
#   process($fullrec);
#}

my $end = time();               # Stop the timer
my $elapsed = $end - $start;    # How long did that take?
my $average = $elapsed/$count;  # Average processing time

printf "Parsed $count log file entries in %.4f seconds, averaging %.4f
+\n", $elapsed, $average;

exit;


__DATA__
<SNIPPED DATA>
[download]

Ie you know that when a line starts with a data that it begins a record, this also implys it denotes the end of the previous record (if such a record exists). Thus you simply need to check each line to see if it begins the record, and then do something with the previous one that you have constructed. This also means that the pattern is anchored and only needs to compare Lr (the length of pattern) chars per line instead of the the Lf (the length of the file) times Lr that your code would do (with the look ahead assertion).

So your pattern does something like 55000000*23 character lookups, even worse most of those will be char class lookups so they are inefficient to start with. If you use the line by line approach you are dealing with 366000*23 lookups. Thats a LOT less. (Actually these are upper bounds, but i think the point is made)

Update:Fixed as per ikegami's reply.

---
$world=~s/war/peace/g

In reply to Re: Pimp My RegEx by demerphq
in thread Pimp My RegEx by heathen

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.