in reply to Longest Common Subsequence
----------
This is not exactly a new problem, and it's relatively easy to solve, at least for two strings. Just sort all the character positions for each string in alphabetical order, then walk through them linearly.
Problem is, the code for sorting the positions uses substr, which means it's copying large sections of the strings on each compare (for a complexity of n lg^2 n instead of n lg n?). Perhaps someone can rewrite the code to use character positions, but for now, this should do for proof of concept.use strict; use warnings; my ($s1, $s2); for (1..10000) { $s1 .= ('A'..'Z')[rand 26]; } for (1..10000) { $s2 .= ('A'..'Z')[rand 26]; } print longest($s1, $s2); sub longest { my ($s1, $s2) = @_; my (@s1, @s2, $p1, $p2, $c, $r, $max); @s1 = sortpos($s1); @s2 = sortpos($s2); $p1 = 0; $p2 = 0; $max = 0; while ($p1 <= $#s1 && $p2 <= $#s2) { $c = match($s1, $s1[$p1], $s2, $s2[$p2]); if ($c > $max) { $r = substr($s1, $s1[$p1], $c); $max = $c; } if ((substr($s1, $s1[$p1]) cmp substr($s2, $s2[$p2])) == -1) { $p1++; } else { $p2++; } } return $r; } sub match { my $p1 = $_[1]; my $p2 = $_[3]; my $max = length($_[0]) - $p1 < length($_[2]) - $p2 ? (length($_[0]) - $p1 - 1) : (length($_[2]) - $p2 - 1); my $c = 0; for (0..$max) { last if substr($_[0], ($p1 + $_), 1) ne substr($_[2], ($p2 + $ +_), 1); $c++; } return $c; } sub sortpos { my $s = $_[0]; return sort { substr($s, $a) cmp substr($s, $b) } 0..(length($s)-1 +); }
It's easier to make this sort of thing efficient in C or C++.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Longest Common Subsequence
by Limbic~Region (Chancellor) on May 13, 2006 at 14:52 UTC |