in reply to Simple String Alignment

Dear Edward,

This was my approach to your problem. It runs through all the sequences you give it in the @sequences array looking for the one with the biggest gaps, and kicking out any where the letter order isn't right. It populates the @in_the_running array with sequences that will work. While this is happening, it makes sure that the sequence with the biggest gap stays at the front of the array (index 0). It then prints the sequence with the biggest gap, and runs through the rest of the sequences that qualified - each sequence it compares against the one with the biggest gap, letter by letter. If it matches, it prints a letter - if it doesn't it prints a dash.

This is one of those fun problems where multiple approaches could be applied - I hope mine assists you with yours!

Best,
  -Adam
@in = split(//,<STDIN>); pop(@in); $regex = '.*'.join('(.*)',@in).'.*'; @sequences = qw/ABCDHA ACDBGJ CADHMK ABICID/; for (@sequences) { @letters = split(//,$_); s/.*($in[0].*$in[$#in]).*/$1/; if ((@gaps) = (m/$regex/o)){ if (length(join('',@gaps)) > $biglen) { unshift(@in_the_running, $ +_); $biglen = length(join('',@gaps)) } else { push(@in_the_running, $_) } } } print $in_the_running[0], "\n"; @base = split(//,$in_the_running[0]); for my $seqno (1..$#in_the_running) { @seq = split(//,$in_the_running[$seqno]); for $q (@base) { if ($seq[0] eq $q) { print $q; shift(@seq); next } else { print '-' } } print "\n"; }

Replies are listed 'Best First'.
Re^2: Simple String Alignment
by monkfan (Curate) on May 11, 2005 at 01:59 UTC
    Adrade,
    Thanks so much for your reply. I've included your part in my code. Somehow your approach doesn't consider the maximum span *in the given query*.
    For example with query: "ACD" it gives
    S4 ABIC S1 AB-C S2 A--C
    i.e it left out the "D" for alignment (kindly see answers above). In other case, given query "AB" it over-aligned them, it gives:
    S4 ABICID S2 A--C-D S1 AB-C-D
    Is there any way I can go about it in your code. Hope to hear from you again.

    Update: Finally! with some tweak suggested by Adrade below. I can get the problem SOLVED! Many thanks for the rest of you, especially Adrade. Don't know what to say! For those who are interested in the final form of the code, check this out:
    Regards,
    Edward
      Dear Edward,

      I think I have discovered the problem. In my code, I did a pop(@in) just to remove the newline that occurs when the user types in the letters. This was accidentally left in your code. If you comment out line 41, everything should work fine!

      I hope this helps - let me know how it turns out!

      Best
        -Adam

      Update: The finally (maybe :-) revised code (that should do everything correctly) can be found here.

      Another Update! Hehe - ok... lets hope this hits the mark: Code is here

      YA-Update: The correct and final code is in the above post. Yay.

      Be well,
        -Adam