in reply to Parsing .2bit DNA files
On my system, this gives the same output as yours. I don't know if it's better, but it is shorter, and it can conveniently use an array instead of a hash. You could also experiment with different tradeoffs on lookup table sizes:my @CONV = glob( "{T,C,A,G}" x 4 ); my $dna = join "", @CONV[ unpack "C*", $raw ];
For some reason, I had byte-order issues doing this. Of course, you must also be careful that $raw is padded to a multiple of 16 bits!## takes 16 bits (= 8 bases = unsigned short) at a time my @CONV = glob( "{T,C,A,G}" x 8 ); my $dna = join "", @CONV[ unpack "S*", $raw ];
Another cute trick I can think of is that you can do some bit-twiddling to implement the M-blocks (apparently lowercasing a range of characters). In ASCII, you can toggle the case of an alphabetic character by bitwise-XOR'ing it with the space character. So I think you can rewrite:
assubstr($dna, $_, $mblock{$_}, lc(substr($dna, $_, $mblock{$_})))
Alternatively, you could use %mblock to generate a long mask of chr(0)'s and chr(32)'s that you can XOR with the entire $dna. Again, probably not a big deal but certainly higher cute-value.substr($dna, $_, $mblock{$_}) ^= (" " x $mblock{$_});
Of course you could always fix M,N blocks on-the-fly, as you are unpacking them from $raw, but that would require some more work. Since I'm typing one-handed these days and it takes me forever, I think I will pass on playing with some code that does that! ;)
blokhead
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parsing .2bit DNA files
by bart (Canon) on Mar 06, 2008 at 11:48 UTC |