in reply to Performance Tuning: Searching Long-Sequence Permutations

You're recompiling those regexes every time through the loop, which is fairly expensive. Use the "qr" syntax to create compiled regexes in the @promoters_regex array, thusly:

$promoters_regex[0] = qr/[NT][GCATN][NG][NC][NG][NT][NG]/; $promoters_regex[1] = qr/[NG][NT][NG][NC][NG][GCATN][NT]/; $promoters_regex[2] = qr/[NA][GCATN][NC][NG][NC][NA][NC]/; $promoters_regex[3] = qr/[NC][NA][NC][NG][NC][GCATN][NA]/; ... $permutations_total++ while $string =~ $promoters_regex[$j];

You would also find it a bit faster if you could combine all the regexes into a big alternative regex, but then you couldn't match multiple promoters that overlap each other so that's probably gonna require lookaheads of some kind. Hm....

PS: Most of the time when you write a for loop with an index iterating through the elements of an array, you'd be better off with a foreach loop. This is one of those cases:

for (@promoters_regex) { $total++ while $string =~ $_; }

    -- Chip Salzenberg, Free-Floating Agent of Chaos

Replies are listed 'Best First'.
Re: Performance Tuning: Searching Long-Sequence Permutations
by Abigail-II (Bishop) on Jun 26, 2003 at 23:37 UTC
    $permutations_total++ while $string =~ $promoters_regex[$j];

    I guess you want:

    $permutations_total++ while $string =~ /$promoters_regex[$j]/g;

    here, otherwise you'll never terminate once there's a match. But now, qr will actually be slower, because you're are interpolating the compiled regex, and recompiling it. And if there are many matches, some gain could be made by writing it as:

    { no warnings 'numeric'; $permutations_total += $string =~ /$promoters_regex[$j]/g; }

    Abigail

      WRT interpolation in /$pat/g: Actually it is fast. When the entire pattern consists of an interpolated qr reference, no recompilation takes place at runtime, so the speed gain of precompilation is not lost. You can observe this with a combination of perl -Dts and reading the source code of pp_regcomp.

      WRT the missing //g: It's a fair cop, but society's to blame. :-)

          -- Chip Salzenberg, Free-Floating Agent of Chaos

      As I stated in /o is dead, long live qr//!, the form /$promoters_regex[$j]/g is not interpolated. The qr object is the entire regex and is not stringified. Your expected overhead does not materialize in this case. Writing it as / $promoters_regex[$j] /gx would trigger the recompilation overhead though.

      Never thought I'd be correcting Abigail, but this will only add 1 to $permutations total. A //g match in scalar context only ever returns 1 (or 0), no matter how many times it matches. Unfortunately, to make it work, you have to run it through an array first
      $permutations_total += @array = $string =~ /$promoters_regex[$j]/g;
      Which isn't very nice. Don't know how to get around that.

      Jasper
        Eh, right, but it doesn't have to be an array. A list will do, even if it's empty:
        $permutations_total += () = $string =~ /$promoters_regex[$j]/g;

        Abigail

        Assign it to an empty list. $permutations_total += () = $string =~ /$promoters_regex[$j]/g.