Re^2: Question about speeding a regexp count

Replies are listed 'Best First'.
Re^3: Question about speeding a regexp count by BrowserUk (Patriarch) on Oct 14, 2005 at 16:59 UTC
Yes. It is mentioned as a caveat in the post that where the subsequence length is not an exact multiple of the sequence length, that the short subsequences will need to be removed (as I did in my second attempt at Re^3: Question about speeding a regexp count). But then again, it is so slow compared to the other methods that it doesn't really warrent consideration anyway. I did come up with this version: sub browser2 { my %count; $count{ A } = $seq =~ tr[A][A]; $count{ C } = $seq =~ tr[C][C]; $count{ G } = $seq =~ tr[G][G]; $count{ T } = $seq =~ tr[T][T]; for( qw'AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT' ) { my $p=0; $count{ $_ }++ while $p = 1+index $seq, $_, $p; } for( qw[ TTT TTG TTC TTA TGT TGG TGC TGA TCT TCG TCC TCA TAT TAG TAC TA +A GTT GTG GTC GTA GGT GGG GGC GGA GCT GCG GCC GCA GAT GAG GAC GA +A CTT CTG CTC CTA CGT CGG CGC CGA CCT CCG CCC CCA CAT CAG CAC CA +A ATT ATG ATC ATA AGT AGG AGC AGA ACT ACG ACC ACA AAT AAG AAC AA +A ] ) { my $p=0; $count{ $_ }++ while $p = 1+index $seq, $_, $p; } 1; } [download] Which is a lot quicker, but still much slower than skeeve's and one of sauoq. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^4: Question about speeding a regexp count by ikegami (Patriarch) on Oct 14, 2005 at 17:12 UTC
The caveat doesn't apply because my sequence was a multiple of 2 or 3 characters. Therefore, my bug report stands.	[reply]
Re^5: Question about speeding a regexp count by BrowserUk (Patriarch) on Oct 15, 2005 at 05:36 UTC
my bug report stands. My (described) two-line bugfix applied. No dramatic difference to the performance of the algorithm #! perl -slw use strict; use Time::HiRes qw[ time ]; use List::Util qw[sum];; my $sequence = join'', map{ ( qw[ A C G T ] )[ rand 4 ] } 1 .. 600_000 +; my $start = time; my( %one, %two, %three ); $one{ $_ }++ for unpack '(A1)', $sequence; $two{ $_ }++ for unpack '(A2X)', $sequence; delete @two{ (qw'A C G T') };; $three{ $_ }++ for unpack '(A3XX)*', $sequence; delete @three{ (qw'AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT') } +; printf "Elapsed: %3f seconds\n", time - $start; print sum values %one; print sum values %two; print sum values %three; __END__ P:\test>junk Elapsed: 3.390625 seconds 600000 599999 599998 [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]