in reply to Performance Tuning: Searching Long-Sequence Permutations
can become### Typical sequence lengths are 10-50k letters my $sub = "CGAGCGTTGACGCNNAGCTAGT"; ### Count base frequencies while ($sub =~ /([GCATN])/g) { if ($1 eq 'G') { $bases{g}++; } if ($1 eq 'C') { $bases{c}++; } if ($1 eq 'A') { $bases{a}++; } if ($1 eq 'T') { $bases{t}++; } if ($1 eq 'N') { $bases{n}++; } } ### I need to create an array of bases to pass to rand() my @bases = ( ('G') x $bases{g}, ('C') x $bases{c}, ('A') x $bases{a}, ('T') x $bases{t}, ('N') x $bases{n}, );
Then, you build your regexes. Use qr. That'll make a huge difference. (In my benchmarking below, I more-than-doubled my number run per second from 81/s to 174/s over a 3 CPU second run.)my $sub = "CGAGCGTTGACGCNNAGCTAGT"; my @bases = split //, $sub;
The for-loops definitely also can use a lot of improvement. This is how I would write your script:
The obligatory benchmark information:my $sub = 'CGAGCGTTGACGCNNAGCTAGT'; my @bases = split //, $sub; my $count = 100; my $len = length $sub; @promoters_regex = ( qr'[NT][GCATN][NG][NC][NG][NT][NG]', qr'[NG][NT][NG][NC][NG][GCATN][NT]', qr'[NA][GCATN][NC][NG][NC][NA][NC]', qr'[NC][NA][NC][NG][NC][GCATN][NA]', ); my $permutations_total = 0; while ($count--) { my $ilen = $len; my $string; $string .= $bases[rand @bases] while $ilen--; foreach my $regex (@promoters_regex) { $permutations_totals += scalar($string =~ m/$regex/g); } }
The reason for seeming improvement that yours has if we increase the count is because the largest speed improvement is building the @bases array. Taking that out, the Benchmark looks like:With count == 100: Benchmark: running mine, yours, each for at least 10 CPU seconds... mine: 11 wallclock secs (10.88 usr + 0.00 sys = 10.88 CPU) @ 17 +2.52/s (n=1877) yours: 19 wallclock secs (18.49 usr + 0.01 sys = 18.50 CPU) @ 38 +.76/s (n=717) Rate yours mine yours 38.8/s -- -78% mine 173/s 345% -- With count == 1000: Benchmark: running mine, yours, each for at least 30 CPU seconds... mine: 33 wallclock secs (31.83 usr + 0.00 sys = 31.83 CPU) @ 17 +.78/s (n=566) yours: 33 wallclock secs (32.43 usr + 0.00 sys = 32.43 CPU) @ 7 +.68/s (n=249) Rate yours mine yours 7.68/s -- -57% mine 17.8/s 132% --
Both yours and mine seem to scale equivalently, which makes sense. Though, on inspection, your should degrade faster than mine as $sub gets really large. (You keep computing the length of $sub every iteration, even though it's static.)With count == 100: Benchmark: running mine, yours, each for at least 10 CPU seconds... mine: 11 wallclock secs (10.43 usr + 0.00 sys = 10.43 CPU) @ 17 +4.98/s (n=1825) yours: 11 wallclock secs (10.26 usr + 0.00 sys = 10.26 CPU) @ 74 +.37/s (n=763) Rate yours mine yours 74.4/s -- -57% mine 175/s 135% -- With count == 1000: Benchmark: running mine, yours, each for at least 30 CPU seconds... mine: 33 wallclock secs (31.58 usr + 0.00 sys = 31.58 CPU) @ 17 +.42/s (n=550) yours: 32 wallclock secs (31.24 usr + 0.00 sys = 31.24 CPU) @ 7 +.55/s (n=236) Rate yours mine yours 7.55/s -- -57% mine 17.4/s 131% --
------
We are the carpenters and bricklayers of the Information Age.
Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Performance Tuning: Searching Long-Sequence Permutations
by diotalevi (Canon) on Jun 27, 2003 at 04:30 UTC | |
by BrowserUk (Patriarch) on Jun 27, 2003 at 11:20 UTC | |
by diotalevi (Canon) on Jun 27, 2003 at 11:37 UTC | |
by BrowserUk (Patriarch) on Jun 27, 2003 at 12:01 UTC | |
by diotalevi (Canon) on Jun 27, 2003 at 12:03 UTC |