in reply to Finding a LCS module on word level

st_ale,
I adapted the code I wrote Re^2: Longest Common Subsequence to work on words instead of characters (and s/subsequence/substring/).
use Algorithm::Loops 'NestedLoops'; use List::Util 'reduce'; my @sentence; while (<DATA>) { chomp $_; push @sentence, [split " ", $_]; } print LCS(@sentence), "\n"; sub LCS{ my @sentence = @_; my @pos; for my $i (0 .. $#sentence) { my $string = $sentence[$i]; for (0 .. $#$string) { my $word = $string->[$_]; push @{$pos[$i]{$word}}, $_; } } my $sh_sentence = reduce {@$a < @$b ? $a : $b} @sentence; my %map; WORD: for my $word (@$sh_sentence) { my @loop; for (0 .. $#pos) { next WORD if ! $pos[$_]{$word}; push @loop, $pos[$_]{$word}; } my $next = NestedLoops([@loop]); while (my @word_map = $next->()) { my $key = join '-', @word_map; $map{$key} = $word; } } my @pile; for my $seq (keys %map) { push @pile, $map{$seq}; for (1 .. 2) { my $dir = $_ % 2 ? 1 : -1; my @offset = split /-/, $seq; $_ += $dir for @offset; my $next = join '-', @offset; while (exists $map{$next}) { $pile[-1] = $dir > 0 ? $pile[-1] . ' ' . $map{$next} : + $map{$next} . ' ' . $pile[-1]; $_ += $dir for @offset; $next = join '-', @offset; } } } return reduce {length($a) > length($b) ? $a : $b} @pile; } __DATA__ I am trying to find a perl LCS module in perl monk perl monk
It works, but I am sure it is the best implementation. I took a solution I wrote for finding the longest common subsequence for more than two strings, adapted it for longest common substring for more than two strings, and then adapted it again to use words than rather than characters. If you want the actual position in the string of the words, you will need to mess with %map.

Cheers - L~R