comment on

There are a number of local improvements you can make.

### Typical sequence lengths are 10-50k letters
my $sub = "CGAGCGTTGACGCNNAGCTAGT";

### Count base frequencies
while ($sub =~ /([GCATN])/g) {
    if ($1 eq 'G') { $bases{g}++; }
    if ($1 eq 'C') { $bases{c}++; }
    if ($1 eq 'A') { $bases{a}++; }
    if ($1 eq 'T') { $bases{t}++; }
    if ($1 eq 'N') { $bases{n}++; }
    }

### I need to create an array of bases to pass to rand()
my @bases = (
    ('G') x $bases{g},
    ('C') x $bases{c},
    ('A') x $bases{a},
    ('T') x $bases{t},
    ('N') x $bases{n},
);
[download]

can become

my $sub = "CGAGCGTTGACGCNNAGCTAGT";
my @bases = split //, $sub;
[download]

Then, you build your regexes. Use qr. That'll make a huge difference. (In my benchmarking below, I more-than-doubled my number run per second from 81/s to 174/s over a 3 CPU second run.)

The for-loops definitely also can use a lot of improvement. This is how I would write your script:

   my $sub = 'CGAGCGTTGACGCNNAGCTAGT';
   my @bases = split //, $sub;

   my $count = 100;
   my $len = length $sub;

   @promoters_regex = (
      qr'[NT][GCATN][NG][NC][NG][NT][NG]',
      qr'[NG][NT][NG][NC][NG][GCATN][NT]',
      qr'[NA][GCATN][NC][NG][NC][NA][NC]',
      qr'[NC][NA][NC][NG][NC][GCATN][NA]',
   );

   my $permutations_total = 0;
   while ($count--)
   {
      my $ilen = $len;
      my $string;
      $string .= $bases[rand @bases] while $ilen--;

      foreach my $regex (@promoters_regex)
      {
         $permutations_totals += scalar($string =~ m/$regex/g);
      }
   }
[download]

The obligatory benchmark information:

With count == 100:
Benchmark: running mine, yours, each for at least 10 CPU seconds...
      mine: 11 wallclock secs (10.88 usr +  0.00 sys = 10.88 CPU) @ 17
+2.52/s (n=1877)
     yours: 19 wallclock secs (18.49 usr +  0.01 sys = 18.50 CPU) @ 38
+.76/s (n=717)
        Rate yours  mine
yours 38.8/s    --  -78%
mine   173/s  345%    --

With count == 1000:
Benchmark: running mine, yours, each for at least 30 CPU seconds...
      mine: 33 wallclock secs (31.83 usr +  0.00 sys = 31.83 CPU) @ 17
+.78/s (n=566)
     yours: 33 wallclock secs (32.43 usr +  0.00 sys = 32.43 CPU) @  7
+.68/s (n=249)
        Rate yours  mine
yours 7.68/s    --  -57%
mine  17.8/s  132%    --
[download]

The reason for seeming improvement that yours has if we increase the count is because the largest speed improvement is building the @bases array. Taking that out, the Benchmark looks like:

With count == 100:
Benchmark: running mine, yours, each for at least 10 CPU seconds...
      mine: 11 wallclock secs (10.43 usr +  0.00 sys = 10.43 CPU) @ 17
+4.98/s (n=1825)
     yours: 11 wallclock secs (10.26 usr +  0.00 sys = 10.26 CPU) @ 74
+.37/s (n=763)
        Rate yours  mine
yours 74.4/s    --  -57%
mine   175/s  135%    --

With count == 1000:
Benchmark: running mine, yours, each for at least 30 CPU seconds...
      mine: 33 wallclock secs (31.58 usr +  0.00 sys = 31.58 CPU) @ 17
+.42/s (n=550)
     yours: 32 wallclock secs (31.24 usr +  0.00 sys = 31.24 CPU) @  7
+.55/s (n=236)
        Rate yours  mine
yours 7.55/s    --  -57%
mine  17.4/s  131%    --
[download]

Both yours and mine seem to scale equivalently, which makes sense. Though, on inspection, your should degrade faster than mine as $sub gets really large. (You keep computing the length of $sub every iteration, even though it's static.)

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

In reply to Re: Performance Tuning: Searching Long-Sequence Permutations by dragonchild
in thread Performance Tuning: Searching Long-Sequence Permutations by Itatsumaki

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.