in reply to Challenge: Find median string in a list
Thinking about ways to speed it up, it seemed to me that the inner loop should first compare against words suspected to be far from the center, so as to break out as early as possible. I thought about picking random combinations of words and saving the pairs with the greatest distance. Or picking a word, finding the word with the greatest distance from it, finding the word with the greatest distance from that, repeat, then using the last pair of words to test against first in the inner loop. But it turns out that, with the optimizations you had both in, sorting the wordlist by decreasing word length gets the number of distance() calls down to just 86006 on your sample data with 14300 words, in a very short period of time:#!/usr/bin/perl use strict; use warnings; use Text::LevenshteinXS; use constant WORD => 0; use constant LEN => 1; use constant MAX => 10_000_000; my @word; while (<DATA>) { tr/\r\n//d; push @word, [$_, 0]; } @word = sort {$a->[WORD] cmp $b->[WORD]} @word; my ($min_word, $min_dist) = ('', MAX); for my $i (0 .. $#word) { next if $word[$i][LEN] >= $min_dist; for my $j (0 .. $#word) { my $dist = distance($word[$i][WORD], $word[$j][WORD]); $word[$i][LEN] = $dist if $dist > $word[$i][LEN]; $word[$j][LEN] = $dist if $dist > $word[$j][LEN]; last if $dist >= $min_dist; } ($min_word, $min_dist) = ($word[$i][WORD], $word[$i][LEN]) if $wor +d[ +$i][LEN] < $min_dist; } print "$min_word\t$min_dist\n"; __DATA__
@word = sort {length($b->[WORD]) <=> length($a->[WORD])} @word;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Challenge: Find median string in a list
by Limbic~Region (Chancellor) on Jul 05, 2007 at 01:26 UTC |