A little over 10x the data takes a little over 10x as long, 36 seconds, though it needed a few tweaks to stop it from blowing gobs of memory:
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
my $sequence = join'', map{ ( qw[ A C G T ] )[ rand 4 ] } 1 .. 100_000
+;
$sequence x= 65;
my $start = time;
my( %one, %two, %three );
print 'starting one';
for( my $i=0; $i < length( $sequence ) - 100_000; $i += 100_000 ) {
$one{ $_ }++ for unpack '(A1)*', substr $sequence, $i, 100_000;
}
print 'starting two';
for( my $i=0; $i < length( $sequence ) - 99_999; $i += 99_999 ) {
$two{ $_ }++ for unpack '(A2X)*', substr $sequence, $i, 100_000;
}
delete @two{ (qw'A C G T') };;
print 'starting three';
for( my $i=0; $i < length( $sequence ) - 99_999; $i += 99_999 ) {
$three{ $_ }++ for unpack '(A3XX)*', substr $sequence, $i, 100_002
+;
}
delete @three{ (qw'AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT') }
+;
printf "Elapsed: %3f seconds\n", time - $start;
<STDIN>;
print "one\n@{[%one]}";
print "two\n@{[%two ]}";
print "three\n@{[%three]}";
__END__
P:\test>499988
starting one
starting two
starting three
Elapsed: 36.257410 seconds
one
A 1604992 T 1613184 C 1589696 G 1592128
two
AC 404168 CC 403645 TG 409301 AT 408133 AA 413074 CT 409560 CG 399941
+TA 409562
GC 400133 CA 401370 GT 407740 AG 404684 TC 406569 GA 406053 TT 412940
+GG 403062
three
GCC 97240 AGT 100880 TGT 105820 TGA 99385 CGA 99580 ATC 103155 AAC 102
+765
AGC 100490 TAC 99060 TCG 100490 ACA 96460 CTG 103155 CCG 97500 GCA 986
+70
GTG 100230 AAG 100750 CAC 98995 GTT 105820 AGA 102310 ACC 101660 CCA 1
+03480
TGG 101985 CGC 98475 CTC 98995 TTG 105300 TAA 102700 CAG 100880 ACG 10
+1400
AAA 105365 ATG 100620 GTA 100360 CTT 102570 TAG 104650 GGA 104780 GTC
+101335
TGC 102115 TCA 102765 ATT 102765 TAT 103155 AAT 104195 ACT 104650 GAC
+103350
CAA 100230 GGT 98865 TCC 104130 TTT 101790 AGG 101010 CGT 102180 CGG 9
+9710
CAT 101270 ATA 101595 CCC 100620 GGG 100360 GAG 98410 TTA 102765 GAT 9
+9515
CTA 104845 TCT 99190 TTC 103090 GCG 100555 GGC 99060 GAA 104780 GCT 10
+3675
CCT 102050
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [d/l] |
| [reply] [d/l] [select] |