in reply to Performance Tuning: Searching Long-Sequence Permutations

One immediate saving that leaps out at me is that you can replace this

while ($sub =~ /([GCATN])/g) { if ($1 eq 'G') { $bases{g}++; } if ($1 eq 'C') { $bases{c}++; } if ($1 eq 'A') { $bases{a}++; } if ($1 eq 'T') { $bases{t}++; } if ($1 eq 'N') { $bases{n}++; } }

with

$bases{ lc($1) }++ while $sub =~ /[GCATN])/g;

Which should speed that bit up. If you can live with your keys being uppercase rather than lower, omit the call to lc. I realise that isn't the major part of time though.

When you dealing with such long strings and especially if you are matching against them multiple times, then studying the string can make a remarkable difference. I'm not sure if this is true where the string contains so few unique characters, but it would be worth testing.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Replies are listed 'Best First'.
Re: Re: Performance Tuning: Searching Long-Sequence Permutations
by Jasper (Chaplain) on Jun 27, 2003 at 12:42 UTC
    At a guess, this should be _much_ faster:
    $bases{g} += $sub =~ y/G//;
    Because there's no pattern't matching (as such), looping, or whatever involved.

    edit: A quick benchmark shows that 5 transliterations is 100 times faster than the while loop for a $sub 10,000 characters long. That's better than I expected.

    Also, although it's really a microoptimisation, I think it's always worthwhile remembering that ++$a is more efficient than $a++.

    Jasper