citromatik has asked for the wisdom of the Perl Monks concerning the following question:
Hi all, monks
I'm trying to mask part of a string based on a bit vector. i.e:
Str: AGACGAGTA Mask: 001111100 --------------- Res: AGxxxxxTA
To do that I wrote the following script (to deal with a dummy example):
use strict; use warnings; my $seq = "AGACGAGTA"; my $mask=''; vec ($mask,$_,1)=1 for (2..6); print "Str: $seq\n"; print "Mask: ", unpack ("b*",$mask),"\n";
At this point the output of the program is:
Str: AGACGAGTA Mask: 00111110
To apply the mask I found two possible solutions:
#Solution 1: my @mask = split "",unpack ("b*",$mask); my @seq = split "",$seq; print "Res: "; print map {! shift @mask ? $_ : "x"} @seq; print "\n";
and...
# Solution 2: my $s_mask = unpack ("b*",$mask); my $pos=0; print "Res: "; while ($pos < length($seq)){ no warnings; ## $s_mask could be shorter than $seq print ! substr($s_mask,$pos,1) ? substr($seq,$pos,1) : "x"; $pos++; } print "\n";
Both outputs as expected:
Res: AGxxxxxTA
But I do not feel to taste with any of the two: In real examples (strings to mask having lengths ~10e5), step through both strings or arrays (the string to mask and the bit vector) could be very inefficient and one of the reason to use bit vectors is efficiency building them ($mask1 | $mask2, $mask1 & $mask2, etc...).
Is there any way to do the masking without having to traverse all the sequence? Am I in the wrong direction for doing the job?
Thanks in advance!
citromatik
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Masking part of a string
by ikegami (Patriarch) on Jun 27, 2007 at 13:31 UTC | |
by johngg (Canon) on Jun 27, 2007 at 14:36 UTC | |
by ikegami (Patriarch) on Jun 27, 2007 at 15:02 UTC | |
by johngg (Canon) on Jun 27, 2007 at 16:20 UTC | |
|
Re: Masking part of a string
by ferreira (Chaplain) on Jun 27, 2007 at 19:44 UTC |