in reply to Re^3: Optimising a search of several thousand files
in thread Optimising a search of several thousand files

heh, well I did say that my implementation may have been a bit wonky.

Anyway, I re-worked it as you suggested, and interestingly it is now significantly slower...

$ time ./gfather.pl dump.1167332700 McDarren 71 Processed 9098 files (total files:57912) 56.07 real 4.10 user 0.97 sys $ time ./gfather.pl dump.1167332700 McDarren 71 Processed 9098 files (total files:57912) 51.58 real 4.15 user 0.91 sys
The re-worked section of the code looks like so:
... undef $/; my $data = <IN>; my $pos = index($data,'McDarren'); $/ = "\n"; next FILE if $pos == -1; # seek(IN, $pos,0); # chomp(my $line = <IN>); my ($user,$level) = (split /\t/, substr $data, $pos)[0,3]; ...
Adding a limit to the split seems to improve things slightly..
my ($user,$level) = (split /\t/, (substr $data, $pos),5)[0,3]; $ time ./gfather.pl dump.1167332700 McDarren 71 Processed 9100 files (total files:57914) 47.50 real 0.79 user 0.80 sys
Not a proper benchmark, I realise. Actually, how would I go about benchmarking this?

Replies are listed 'Best First'.
Re^5: Optimising a search of several thousand files
by glasswalk3r (Friar) on Jan 29, 2007 at 13:05 UTC
    Not a proper benchmark, I realise. Actually, how would I go about benchmarking this?

    See Benchmark and Devel::DProf.

    Alceu Rodrigues de Freitas Junior
    ---------------------------------
    "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill