in reply to Re^2: Optimising a search of several thousand files
in thread Optimising a search of several thousand files

Urk! What is that seek doing in there? You already have the line in $data and the start point in $pos. (split /\t/,substr $data, $pos)[0,3] ought do the job. It may be faster to constrain split to just finding the first 4 elements, but I'd have to benchmark that.


DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^4: Optimising a search of several thousand files
by McDarren (Abbot) on Jan 29, 2007 at 09:33 UTC
    heh, well I did say that my implementation may have been a bit wonky.

    Anyway, I re-worked it as you suggested, and interestingly it is now significantly slower...

    $ time ./gfather.pl dump.1167332700 McDarren 71 Processed 9098 files (total files:57912) 56.07 real 4.10 user 0.97 sys $ time ./gfather.pl dump.1167332700 McDarren 71 Processed 9098 files (total files:57912) 51.58 real 4.15 user 0.91 sys
    The re-worked section of the code looks like so:
    ... undef $/; my $data = <IN>; my $pos = index($data,'McDarren'); $/ = "\n"; next FILE if $pos == -1; # seek(IN, $pos,0); # chomp(my $line = <IN>); my ($user,$level) = (split /\t/, substr $data, $pos)[0,3]; ...
    Adding a limit to the split seems to improve things slightly..
    my ($user,$level) = (split /\t/, (substr $data, $pos),5)[0,3]; $ time ./gfather.pl dump.1167332700 McDarren 71 Processed 9100 files (total files:57914) 47.50 real 0.79 user 0.80 sys
    Not a proper benchmark, I realise. Actually, how would I go about benchmarking this?
      Not a proper benchmark, I realise. Actually, how would I go about benchmarking this?

      See Benchmark and Devel::DProf.

      Alceu Rodrigues de Freitas Junior
      ---------------------------------
      "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill