Re: Fast string similarity method

I can't help with the matching itself, but for a slight adjustment, you might try benchmarking one of the following changes to your for loops:

for ( my $i = $size; $i--; ) { ... }
for ( my $j = $i; $j--; ) { ... }
[download]

foreach my $i ( 0 .. $size-1 ) { ... }
foreach my $j ( $i+1 .. $size-1 ) { ... }
[download]

foreach my $i ( 0 .. $size-1 ) { ... }
foreach my $string2 { @{$arrayDocs}[ $i+1 .. $size-1 ] } {
  # change references to '$arrayDocs->[$j]' to '$string2'
}
[download]

The first one can save a couple of operations per cycle (most likely, not significant compared to the contents of the loops, but it might shave off a second or two, and it works in other languages (see below)). The second one assumes that perl's optimization of iterating through a list of integers is faster than a 'for' loop (see below), and the last one tries to save time by reducing the number of times $arrayDocs->[$j] is referenced.

You'd have to test the last one for yourself, as it's going to be affected by the qualities of the data (how many times you actually match)

I know, people are going to complain that I'm optimizing the wrong part, but well, if it shaves off a few seconds at 5k records, it should take off ~900 times that amount at 150k records

#           s/iter      orig backwards   foreach
# orig        10.6        --      -45%      -55%
# backwards   5.80       83%        --      -18%
# foreach     4.75      123%       22%        --
[download]

use Benchmark qw(cmpthese);

my $size = 5000;

my $orig = sub {
  for (my $i = 0; $i < ($size - 1); $i++) {
    for (my $j = $i + 1; $j < ($size - 1); $j++) {
    }
  }
};

my $backwards = sub {
  for ( my $i = $size; $i--; ) {
    for ( my $j = $i; $j--; ) {
    }
  }
};

my $foreach = sub {
  foreach my $i ( 0 .. $size-1 ) {
    foreach my $j ( $i+1 .. $size-1 ) {
    }
  }
};

cmpthese ( 10 , { orig => $orig, backwards => $backwards, foreach => $
+foreach } );
[download]

Comment on Re: Fast string similarity method Select or Download Code