----------
This is not exactly a new problem, and it's relatively easy to solve, at least for two strings. Just sort all the character positions for each string in alphabetical order, then walk through them linearly.
Problem is, the code for sorting the positions uses substr, which means it's copying large sections of the strings on each compare (for a complexity of n lg^2 n instead of n lg n?). Perhaps someone can rewrite the code to use character positions, but for now, this should do for proof of concept.use strict; use warnings; my ($s1, $s2); for (1..10000) { $s1 .= ('A'..'Z')[rand 26]; } for (1..10000) { $s2 .= ('A'..'Z')[rand 26]; } print longest($s1, $s2); sub longest { my ($s1, $s2) = @_; my (@s1, @s2, $p1, $p2, $c, $r, $max); @s1 = sortpos($s1); @s2 = sortpos($s2); $p1 = 0; $p2 = 0; $max = 0; while ($p1 <= $#s1 && $p2 <= $#s2) { $c = match($s1, $s1[$p1], $s2, $s2[$p2]); if ($c > $max) { $r = substr($s1, $s1[$p1], $c); $max = $c; } if ((substr($s1, $s1[$p1]) cmp substr($s2, $s2[$p2])) == -1) { $p1++; } else { $p2++; } } return $r; } sub match { my $p1 = $_[1]; my $p2 = $_[3]; my $max = length($_[0]) - $p1 < length($_[2]) - $p2 ? (length($_[0]) - $p1 - 1) : (length($_[2]) - $p2 - 1); my $c = 0; for (0..$max) { last if substr($_[0], ($p1 + $_), 1) ne substr($_[2], ($p2 + $ +_), 1); $c++; } return $c; } sub sortpos { my $s = $_[0]; return sort { substr($s, $a) cmp substr($s, $b) } 0..(length($s)-1 +); }
It's easier to make this sort of thing efficient in C or C++.
In reply to Re: Longest Common Subsequence
by TedPride
in thread Longest Common Subsequence
by Limbic~Region
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |