in reply to Re: True Brute Force Longest Common Sub Sequence
in thread True Brute Force Longest Common Sub Sequence

If you just want to use brute force then using a hash is really cheating as it is already the begining of optimisation. We need nasty embedded loops to go O(2^n*2^m). I think something like this represents to worst way to do it, first generating all possible subsequences and then testing every case. This does however seem to be the point.

use strict; sub subseqs { my $str = $_[0]; my @seqs = (''); for my $char (split //, $str) { for my $i ( 0..@seqs-1 ) { push @seqs, $seqs[$i].$char; } } return @seqs; } my $s1 = 'abcdefgh'; my $s2 = 'abecdgh'; my $best_len = 0; my $longest; for my $seq1 (subseqs($s1)) { for my $seq2 (subseqs($s2)) { next unless $seq1 eq $seq2; if (length($seq1) > $best_len) { $longest = $seq1; $best_len = length($seq1); } } } print $longest;

Note that there is an edge case when 2 or more common subsequences share the same length only the first will be found.

Replies are listed 'Best First'.
Re^3: True Brute Force Longest Common Sub Sequence
by ikegami (Patriarch) on Nov 25, 2007 at 18:14 UTC

    If you just want to use brute force then using a hash is really cheating as it is already the begining of optimisation.

    Have another look. That claim just isn't true.

    common still visits every single combination of inputs, so it's brute force. Said nasty embedded loops *are there* and the hash doesn't affect the complexity.

    common uses a hash to remove dups, something that isn't even necessary. It actually slows things down, if anything.

    Even with the optimized common, said nasty embedded loops *are there* overall and the using the optimized common doesn't affect the overall complexity.

      Even with the optimized common, said nasty embedded loops *are there* overall and the using the optimized common doesn't affect the overall complexity.

      Actually in the optimised version there are no nested loops and the overall complexity is greatly reduced. The non-optimised version certainly has embedded loops but as you show in the optimised version there is not any need for them because you are using a hash.

        I strongly disagree. Not only does the optimized common still requires the nested loops to find the LCS, worse/average O(N^2*M^2 + M+N + M+N) is still O(N^2*M^2).

Re^3: True Brute Force Longest Common Sub Sequence
by ikegami (Patriarch) on Nov 28, 2007 at 01:56 UTC

    Your analysis doesn't take into account $seqs[$i] . $char and $seq1 eq $seq2. They are both based on N and/or M. $seqs[$i] . $char won't affect the final O() because it's higher up in the loop, but $seq1 eq $seq2 is in the innermost loop.

    for my $seq1 (subseqs($s1)) { for my $seq2 (subseqs($s2)) {

    could be written as

    my @s1 = subseqs($s1); my @s2 = subseqs($s2); for my $seq1 (@s1) { for my $seq2 (@s2) {

    for faster code of the same complexity.