Re^3: How to improve speed of reading big files

If you have the time and inclination, try this and see how you fare. You might have to make a few tweaks, it compiles clean but is otherwise untested beyond a mental run through, which given my mind is notoriously unreliable:)

sub mergeLogs {
    my ($day, @files) = @_;
    my @lines;

    foreach my $file (@files) {
        my $fh = openLogFile($file);
        warn "$0: ignoring file $file\n", next unless defined $fh;
        warn "-> processing $file\n" if $opts{'verbose'} > 0;

        while( <$fh> ) {
            next unless /Running|Dump|FromCB|Update/o;
            next if exists $opts{'day'} && ! /^$opts{'day'}/o;
            next if exists $opts{'user'} && ! /[\(\[]\s*(?:$opts{'user
+'})/o;
            
            $opts{'server'} = lc $1, next 
                if !exists $opts{'server'} && /\* Running on (\w+) -/;
            
            my $time = /(\d{2}:\d{2}:\d{2}\.\d{3})/o;
            next if exists $opts{'start-time'} && $time lt $opts{'star
+t-time'};
            next if exists $opts{'stop-time'}  && $time gt $opts{'stop
+-time'};

            s/ {2,}/ /go;
            s/ ?: / /go;
            s/^((?:\S+ ){3}).+?\[?I\]?:/$1/o;
            s/ ACK (\w) / $1 ACK /o;

            warn $_ if $opts{'verbose'} > 3;
            ## prepend the time key now save reparsing later
            push @lines, $time . $_; 
        }
        close $fh;
    }
    ## Sort in-place to save memory shuffling 
    ## (if it decides to play ball today)
    @lines = sort @lines; 

    substr $_, 0, 12, '' for @lines; ## Trim the prepended keys
    
    return \@lines;
}
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP PCW It is as I've been saying!(Audio until 20090817)

Comment on Re^3: How to improve speed of reading big files Download Code

Replies are listed 'Best First'.
Re^4: How to improve speed of reading big files by Anonymous Monk on Sep 21, 2009 at 13:13 UTC
I used the above code with some minor modifications : i remove the day at the beginning of each line to have the time in front -> ready to be sorted and no substr needed! results are: - before (old old old code, no tweaks) --> 84sec - new and improved code --> 17 sec The biggest gain (more than 40 sec) was on using the while on the handle instead of the list context with grep Thanks a lot !!!	[reply]
Re^5: How to improve speed of reading big files by BrowserUk (Patriarch) on Sep 21, 2009 at 13:34 UTC
Chopping off the date instead of prepending/sorting/removing the time--cool! Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW It is as I've been saying!(Audio until 20090817)	[reply]