st_ale,
I adapted the code I wrote Re^2: Longest Common Subsequence to work on words instead of characters (and s/subsequence/substring/).
use Algorithm::Loops 'NestedLoops'; use List::Util 'reduce'; my @sentence; while (<DATA>) { chomp $_; push @sentence, [split " ", $_]; } print LCS(@sentence), "\n"; sub LCS{ my @sentence = @_; my @pos; for my $i (0 .. $#sentence) { my $string = $sentence[$i]; for (0 .. $#$string) { my $word = $string->[$_]; push @{$pos[$i]{$word}}, $_; } } my $sh_sentence = reduce {@$a < @$b ? $a : $b} @sentence; my %map; WORD: for my $word (@$sh_sentence) { my @loop; for (0 .. $#pos) { next WORD if ! $pos[$_]{$word}; push @loop, $pos[$_]{$word}; } my $next = NestedLoops([@loop]); while (my @word_map = $next->()) { my $key = join '-', @word_map; $map{$key} = $word; } } my @pile; for my $seq (keys %map) { push @pile, $map{$seq}; for (1 .. 2) { my $dir = $_ % 2 ? 1 : -1; my @offset = split /-/, $seq; $_ += $dir for @offset; my $next = join '-', @offset; while (exists $map{$next}) { $pile[-1] = $dir > 0 ? $pile[-1] . ' ' . $map{$next} : + $map{$next} . ' ' . $pile[-1]; $_ += $dir for @offset; $next = join '-', @offset; } } } return reduce {length($a) > length($b) ? $a : $b} @pile; } __DATA__ I am trying to find a perl LCS module in perl monk perl monk
It works, but I am sure it is the best implementation. I took a solution I wrote for finding the longest common subsequence for more than two strings, adapted it for longest common substring for more than two strings, and then adapted it again to use words than rather than characters. If you want the actual position in the string of the words, you will need to mess with %map.

Cheers - L~R


In reply to Re: Finding a LCS module on word level by Limbic~Region
in thread Finding a LCS module on word level by st_ale

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.