Re^4: Optimising a search of several thousand files

heh, well I did say that my implementation may have been a bit wonky.

Anyway, I re-worked it as you suggested, and interestingly it is now significantly slower...

$ time ./gfather.pl
dump.1167332700 McDarren 71
Processed 9098 files (total files:57912)
       56.07 real         4.10 user         0.97 sys
$ time ./gfather.pl
dump.1167332700 McDarren 71
Processed 9098 files (total files:57912)
       51.58 real         4.15 user         0.91 sys
[download]

The re-worked section of the code looks like so:

...
    undef $/;
    my $data = <IN>;
    my $pos = index($data,'McDarren');
    $/ = "\n";
    next FILE if $pos == -1;
#    seek(IN, $pos,0);
#   chomp(my $line = <IN>);
    my ($user,$level) = (split /\t/, substr $data, $pos)[0,3];
...
[download]

Adding a limit to the split seems to improve things slightly..

my ($user,$level) = (split /\t/, (substr $data, $pos),5)[0,3];

$ time ./gfather.pl
dump.1167332700 McDarren 71
Processed 9100 files (total files:57914)
       47.50 real         0.79 user         0.80 sys
[download]

Not a proper benchmark, I realise. Actually, how would I go about benchmarking this?

Comment on Re^4: Optimising a search of several thousand files Select or Download Code

Replies are listed 'Best First'.
Re^5: Optimising a search of several thousand files by glasswalk3r (Friar) on Jan 29, 2007 at 13:05 UTC
Not a proper benchmark, I realise. Actually, how would I go about benchmarking this? See Benchmark and Devel::DProf. Alceu Rodrigues de Freitas Junior --------------------------------- "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill	[reply]