Hi all, monks
I'm trying to mask part of a string based on a bit vector. i.e:
Str: AGACGAGTA Mask: 001111100 --------------- Res: AGxxxxxTA
To do that I wrote the following script (to deal with a dummy example):
use strict; use warnings; my $seq = "AGACGAGTA"; my $mask=''; vec ($mask,$_,1)=1 for (2..6); print "Str: $seq\n"; print "Mask: ", unpack ("b*",$mask),"\n";
At this point the output of the program is:
Str: AGACGAGTA Mask: 00111110
To apply the mask I found two possible solutions:
#Solution 1: my @mask = split "",unpack ("b*",$mask); my @seq = split "",$seq; print "Res: "; print map {! shift @mask ? $_ : "x"} @seq; print "\n";
and...
# Solution 2: my $s_mask = unpack ("b*",$mask); my $pos=0; print "Res: "; while ($pos < length($seq)){ no warnings; ## $s_mask could be shorter than $seq print ! substr($s_mask,$pos,1) ? substr($seq,$pos,1) : "x"; $pos++; } print "\n";
Both outputs as expected:
Res: AGxxxxxTA
But I do not feel to taste with any of the two: In real examples (strings to mask having lengths ~10e5), step through both strings or arrays (the string to mask and the bit vector) could be very inefficient and one of the reason to use bit vectors is efficiency building them ($mask1 | $mask2, $mask1 & $mask2, etc...).
Is there any way to do the masking without having to traverse all the sequence? Am I in the wrong direction for doing the job?
Thanks in advance!
citromatik
In reply to Masking part of a string by citromatik
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |