If you bitwise or (|) an uppercase letter with a space, (assuming latin-1/ASCII files), it will lowercase it:
print 'ACGT' | ' ';; acgt
So, if you translate all the 'N's in your mask to spaces and then bitwise or the sequence and the mask, it will achieve your goal very efficiently:
$s = 'GGTACACAGAAGCCAAAGCAGGCTCCAGGCTCTGAGCTGTCAGCACAGAGACCGAT';; $m = 'GGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT';; ( $mm = $m ) =~ tr[N][\x20];; print $mm;; GGT T print $s | $mm;; GGTacacagaagccaaagcaggctccaggctctgagctgtcagcacagagaccgaT
Which makes your entire program (excluding the unmentioned fact that your files may be in FASTA format):
#! perl -slw use strict; open SEQ, '<', 'data1.dat' or die $!; open MASK, '<', 'data2.dat' or die $!; while( my $seq = <SEQ> ) { ## Read a sequence my $mask = <MASK>; ## And the corresponding mask $mask =~ tr[N][ ]; ## Ns => spaces print $seq | $mask; ## bitwise-OR them and print the result } close SEQ; close MASK;
Redirect the output to a third file and you're done.
In reply to Re: Lower-casing Substrings and Iterating Two Files together
by BrowserUk
in thread Lower-casing Substrings and Iterating Two Files together
by neversaint
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |