in reply to Fast string similarity method
I can't help with the matching itself, but for a slight adjustment, you might try benchmarking one of the following changes to your for loops:
for ( my $i = $size; $i--; ) { ... } for ( my $j = $i; $j--; ) { ... }
or
foreach my $i ( 0 .. $size-1 ) { ... } foreach my $j ( $i+1 .. $size-1 ) { ... }
or
foreach my $i ( 0 .. $size-1 ) { ... } foreach my $string2 { @{$arrayDocs}[ $i+1 .. $size-1 ] } { # change references to '$arrayDocs->[$j]' to '$string2' }
The first one can save a couple of operations per cycle (most likely, not significant compared to the contents of the loops, but it might shave off a second or two, and it works in other languages (see below)). The second one assumes that perl's optimization of iterating through a list of integers is faster than a 'for' loop (see below), and the last one tries to save time by reducing the number of times $arrayDocs->[$j] is referenced.
You'd have to test the last one for yourself, as it's going to be affected by the qualities of the data (how many times you actually match)
I know, people are going to complain that I'm optimizing the wrong part, but well, if it shaves off a few seconds at 5k records, it should take off ~900 times that amount at 150k records
# s/iter orig backwards foreach # orig 10.6 -- -45% -55% # backwards 5.80 83% -- -18% # foreach 4.75 123% 22% --
use Benchmark qw(cmpthese); my $size = 5000; my $orig = sub { for (my $i = 0; $i < ($size - 1); $i++) { for (my $j = $i + 1; $j < ($size - 1); $j++) { } } }; my $backwards = sub { for ( my $i = $size; $i--; ) { for ( my $j = $i; $j--; ) { } } }; my $foreach = sub { foreach my $i ( 0 .. $size-1 ) { foreach my $j ( $i+1 .. $size-1 ) { } } }; cmpthese ( 10 , { orig => $orig, backwards => $backwards, foreach => $ +foreach } );
|
|---|