Re: Lower-casing Substrings and Iterating Two Files together

When I initially read this, I did not assume the sequences in "data2.txt" were always as long as the corresponding sequences in "data1.txt". Prior replies have assumed they are and produced suggestions that are more efficient than what I propose. If the assumption is not true, this can be done with regular expressions. This snippet uses the OP's "seq1" to demonstrate constructing search and replace strings from the "hard mask" provided.

$seq =  'GGTACACAGAAGCCAAAGCAGGCTCCAGGCTCTGAGCTGTCAGCACAGAGACCGAT';
$mask = 'GGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT';

($srch = $mask) =~ s/(N+)/($1)/g;
$srch           =~ tr/N/./;

$cnt = 1;
($repl = $mask) =~ s/N+/" . lc(\$" . $cnt++ . ") . "/ge;

print $srch, "\n", $repl, "\n\n", $seq, "\n";

$seq =~ s/$srch/$repl/ee;
print $seq, "\n";
[download]

The output should look like:

GGT(....................................................)T
GGT . lc($1) . T

GGTACACAGAAGCCAAAGCAGGCTCCAGGCTCTGAGCTGTCAGCACAGAGACCGAT
GGTacacagaagccaaagcaggctccaggctctgagctgtcagcacagagaccgaT
[download]

That said, if the aforementioned assumption does hold, I have to think that a search for efficiency should begin in the code that generated "data2.txt", if at all possible.

Comment on Re: Lower-casing Substrings and Iterating Two Files together Select or Download Code