On my system, this gives the same output as yours. I don't know if it's better, but it is shorter, and it can conveniently use an array instead of a hash. You could also experiment with different tradeoffs on lookup table sizes:my @CONV = glob( "{T,C,A,G}" x 4 ); my $dna = join "", @CONV[ unpack "C*", $raw ];
For some reason, I had byte-order issues doing this. Of course, you must also be careful that $raw is padded to a multiple of 16 bits!## takes 16 bits (= 8 bases = unsigned short) at a time my @CONV = glob( "{T,C,A,G}" x 8 ); my $dna = join "", @CONV[ unpack "S*", $raw ];
Another cute trick I can think of is that you can do some bit-twiddling to implement the M-blocks (apparently lowercasing a range of characters). In ASCII, you can toggle the case of an alphabetic character by bitwise-XOR'ing it with the space character. So I think you can rewrite:
assubstr($dna, $_, $mblock{$_}, lc(substr($dna, $_, $mblock{$_})))
Alternatively, you could use %mblock to generate a long mask of chr(0)'s and chr(32)'s that you can XOR with the entire $dna. Again, probably not a big deal but certainly higher cute-value.substr($dna, $_, $mblock{$_}) ^= (" " x $mblock{$_});
Of course you could always fix M,N blocks on-the-fly, as you are unpacking them from $raw, but that would require some more work. Since I'm typing one-handed these days and it takes me forever, I think I will pass on playing with some code that does that! ;)
blokhead
In reply to Re: Parsing .2bit DNA files
by blokhead
in thread Parsing .2bit DNA files
by Limbic~Region
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |