in reply to merging dna sequences

In the spirit of TIMTOWTDI, here is another approach:

my $s1 = "AYGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTTA"; my $s2 = "ACGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTKA"; my $s1_ = $s1; $s1_ =~ tr/ACGT/\0/c; my $s2_ = $s2; $s2_ =~ tr/ACGT/\0/c; my $merged = $s1 ^ $s1_ ^ $s2 ^ $s2_ | $s1_ & $s2_; say $merged; # AYGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTK +A

Replies are listed 'Best First'.
Re^2: merging dna sequences
by Anonymous Monk on Nov 10, 2011 at 22:11 UTC
    I too thought "bitwise string" but ended up with a different mix of operators:
    my $s = 'AYGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTTA'; my $t = 'ACGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTKA'; tr/ACGTA-Z/\0\0\0\0_/ for my ($s_, $t_) = ($s, $t); my $merged = ($s | $t_) & ($t | $s_); say $merged;
      This one is inspired by moritz's solution, which only replaced the last "T" with a "K" in the first string. It has one less bit-op than my original, so should be a bit faster. (LOL... that one snuck up on me while typing)
      my $s = 'AYGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTTA'; my $t = 'ACGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTKA'; (my $del = $t) =~ tr/ACGTA-Z/\0\0\0\0_/; (my $add = $t) =~ tr/ACGT/_/; my $merged = ($s | $del) & $add; say $merged;
Re^2: merging dna sequences
by garyboyd (Acolyte) on Nov 10, 2011 at 12:29 UTC

    Thank you both for your solutions - would this still work for more than 2 strings?

      Is f'(f'(A,B),C)) the same as F(A,B,C)? If so, then just run a loop on your array of strings, replacing the first two with the results if f'(A,B). By the time you are down to a single string left, you will have the same results as F(A,B,C,....).

      List::Util's reduce function can be helpful in implementing this.

      --MidLifeXis

        would this still work for more than 2 strings

        As long as the ambiguous bases are in distinct positions, yes.

        In the general case, you can then write:

        my @s = ( "AYGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTTA", "ACGTACTAGACTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTKA", "ACGTACTAGWCTACAGACTACAGACATCTACAGACTCATCAGCAGCATATTTA", "ACGTACTAGACTACAGACTACAGMCATCTACAGACTCATCAGCAGCATATTTA", "ACGTACTAGACTACAGACTACAGACATCTACAGACTCATRAGCAGCATATTTA", # ... ); my $m = shift @s; for my $s (@s) { my $m_ = $m; $m_ =~ tr/ACGT/\0/c; my $s_ = $s; $s_ =~ tr/ACGT/\0/c; $m = $m ^ $m_ ^ $s ^ $s_ | $m_ & $s_; } say $m; # AYGTACTAGWCTACAGACTACAGMCATCTACAGACTCATRAGCAGCATATTKA ^ ^ ^ ^ ^

        (P.S., sorry — meant to reply to garyboyd... )