in reply to finding substrings that have been inserted into a string

OK, here's my first attempt at a solution

#my $old_seq='ATGC---ACGT---TAGCAAGGTAAAT'; #my $new_seq='----AT-GC----ACGT----TAGCAA----------GGTAAAT---'; print "old seq is\n$old_seq\nnew seq is\n$new_seq\n"; my @old_array = split //, lc ($old_seq); my @new_array = split //, lc ($new_seq); my $n_old=0; my $n_new=0; my %gaps; while (my $old_base = shift @old_array){ my $new_base = shift @new_array; #print "base in oldseq is $old_base, base in newseq is $new_base\n"; if ($old_base eq $new_base){ # print "match!\n"; $n_old++; $n_new++; next; } else{ # print "no match! - must be a new gap at position $n_new new, $n_o +ld old\n"; my $new_gap_length=0; while ($new_base = shift @new_array){ $n_new++; $new_gap_length++; # print "newbase is $new_base\n"; if ($new_base eq $old_base){ # print "found it - length was $new_gap_length\n"; $gaps{$n_new-$new_gap_length} = $new_gap_length; $n_old++; $n_new++; last; } } } } foreach (sort {$a <=> $b} keys %gaps){ print "gap at position in new $_, length $gaps{$_}\n"; }
It always reports the new gap as being at the end of the existing gap in the cases where the new gap's position can't be unambiguously decided. Anyone spot any flaws in this?

Replies are listed 'Best First'.
Re^2: finding substrings that have been inserted into a string
by graff (Chancellor) on Sep 17, 2006 at 22:05 UTC
    Anyone spot any flaws in this?

    Well, is it a flaw that you don't report the final "---" at the end of the "new_string"? Your while loop is based on the length of the "old_string", and when that's done, you're not checking whether there's anything left in the new_string beyond the last match. It would be an easy thing to add an element to %gaps after the while loop, if the current value of $n_new is less than $#new_array.