in reply to Re: How to improve speed of reading big files
in thread How to improve speed of reading big files
Thanks for your reply, it helped a lot.
I gained 10 sec on the filterLog function and 10 more on the mergeLogs (without counting the filterLog part).
Here is the final version of the code:
sub mergeLogs { my ($day, @files) = @_; my @lines; foreach my $file (@files) { my $fh = openLogFile($file); if (! defined $fh) { warn "$0: ignoring file $file\n"; next; } warn "-> processing $file\n" if $opts{'verbose'} > 0; push @lines, grep { /Running|Dump|FromCB|Update/o && &filterLo +g } <$fh>; close $fh; } use Benchmark 'cmpthese'; cmpthese( -1, { 'ST' => sub { ( sortLinesST(\@lines) )[ 0 ] }, 'GRT' => sub { ( sortLinesGRT(\@lines) )[ 0 ] }, 'GRT2' => sub { ( sortLinesGRT2(\@lines))[ 0 ] }, 'GRT3' => sub { ( sortLinesGRT3(\@lines) )[ 0 ] }, } ); exit; } sub filterLog { return 0 if exists $opts{'day'} && ! /^$opts{'day'}/o; if (! exists $opts{'server'}) { if (/\* Running on (\w+) -/) { $opts{'server'} = lc $1; return 0; } } if (exists $opts{'start-time'} || exists $opts{'stop-time'}) { if (/(\d{2}:\d{2}:\d{2}\.\d{3})/o) { return 0 if exists $opts{'start-time'} && $1 lt $opts{'sta +rt-time'}; return 0 if exists $opts{'stop-time'} && $1 gt $opts{'stop +-time'}; } } return 0 if exists $opts{'user'} && ! /[\(\[]\s*(?:$opts{'user'})/ +o; s/ {2,}/ /go; s/ ?: / /go; s/^((?:\S+ ){3}).+?\[?I\]?:/$1/o; s/ ACK (\w) / $1 ACK /o; warn $_ if $opts{'verbose'} > 3; return 1; } sub sortLinesST { my $href = shift; return [ map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, +/(\d{2}:\d{2}:\d{2}\.\d{3})/o ] } @$href ]; } sub sortLinesGRT { my $href = shift; return [ map { substr($_, 12) } sort map { /(\d{2}:\d{2}:\d{2}\.\d +{3})/o; $1 . $_ } @$href ]; } sub sortLinesGRT2 { my $href = shift; return [ map { substr($_, 4) } sort map { /(\d{2}):(\d{2}):(\d{2}) +\.(\d{3})/o; pack ('NA*', ( $1*60 + $2 )*60 + "$3.$4" ) . $_ } @$href + ]; } sub sortLinesGRT3 { my $href = shift; return [ @$href[ map { unpack "N", substr($_, -4) } sort map { $hr +ef->[$_] =~ /(\d{2}:\d{2}:\d{2}\.\d{3})/o; $1 . pack ("N", $_) } 0..$ +#$href ] ]; }
I ran some benchmarks to find the appropriate sort technique, here are the results:
Rate GRT2 ST GRT3 GRT GRT2 30.2/s -- -19% -42% -45% ST 37.1/s 23% -- -28% -32% GRT3 51.9/s 72% 40% -- -5% GRT 54.6/s 81% 47% 5% --
I would think first that the GRT3 sub would be the fastest using the index sorting. But apparently not.
However i do not understand why the GRT2 sub is slower than the ST sub ?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: How to improve speed of reading big files
by BrowserUk (Patriarch) on Sep 18, 2009 at 18:47 UTC | |
|
Re^3: How to improve speed of reading big files
by BrowserUk (Patriarch) on Sep 18, 2009 at 22:20 UTC | |
by Anonymous Monk on Sep 21, 2009 at 13:13 UTC | |
by BrowserUk (Patriarch) on Sep 21, 2009 at 13:34 UTC |