Re^4: Removing Flanking "N"s in a DNA String

How about “apples and oranges” or “it’s a really worthless benchmark”?

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw( cmpthese );

sub run_tests {
    my ( $len_remove, $len_keep, $num_repeat ) = @_;

    my $remove = 'N' x $len_remove;
    my $keep = 'O' x $len_keep;

    my %test = (
        front     => "$remove$keep" x $num_repeat,
        tail      => "$keep$remove" x $num_repeat,
        both_ends => "$remove$keep" x $num_repeat . $remove,
        nothing   => "$keep$remove" x $num_repeat . $keep,
    );

    print "$len_remove chars to remove, $len_keep chars long kept sequ
+ences, $num_repeat repetitions.\n";
    for my $type ( keys %test ) {
        print "Measuring removing at $type.\n";
        cmpthese -2 => {
            one_sub => sub { for( 1 .. 1000 ) { s{^N*(.*?)N*$}{$1} for
+ my $copy = $test{$type} } },
            two_sub => sub { for( 1 .. 1000 ) { s{^N*}{}, s{N*$}{} for
+ my $copy = $test{$type} } },
        };
    }

    print "\n";
}

$|++;

run_tests 4, 4, 1;
run_tests 20, 20, 1;
run_tests 20, 20, 50;
run_tests 4, 4, 20;
run_tests 4, 12, 10;
run_tests 4, 100, 100;
[download]

This gives me:

4 chars to remove, 4 chars long kept sequences, 1 repetitions.
Measuring removing at front.
         Rate one_sub two_sub
one_sub 125/s      --    -53%
two_sub 269/s    115%      --
Measuring removing at tail.
         Rate one_sub two_sub
one_sub 126/s      --    -54%
two_sub 276/s    120%      --
Measuring removing at nothing.
         Rate one_sub two_sub
one_sub 102/s      --    -49%
two_sub 201/s     98%      --
Measuring removing at both_ends.
         Rate one_sub two_sub
one_sub 122/s      --    -54%
two_sub 266/s    118%      --

20 chars to remove, 20 chars long kept sequences, 1 repetitions.
Measuring removing at front.
          Rate one_sub two_sub
one_sub 85.8/s      --    -48%
two_sub  165/s     92%      --
Measuring removing at tail.
          Rate one_sub two_sub
one_sub 85.8/s      --    -48%
two_sub  165/s     93%      --
Measuring removing at nothing.
          Rate one_sub two_sub
one_sub 48.8/s      --    -40%
two_sub 80.8/s     65%      --
Measuring removing at both_ends.
          Rate one_sub two_sub
one_sub 85.0/s      --    -48%
two_sub  162/s     91%      --

20 chars to remove, 20 chars long kept sequences, 50 repetitions.
Measuring removing at front.
          Rate one_sub two_sub
one_sub 2.20/s      --    -29%
two_sub 3.10/s     41%      --
Measuring removing at tail.
          Rate one_sub two_sub
one_sub 2.16/s      --    -29%
two_sub 3.03/s     40%      --
Measuring removing at nothing.
          Rate one_sub two_sub
one_sub 2.15/s      --    -29%
two_sub 3.04/s     42%      --
Measuring removing at both_ends.
          Rate one_sub two_sub
one_sub 2.16/s      --    -27%
two_sub 2.99/s     38%      --

4 chars to remove, 4 chars long kept sequences, 20 repetitions.
Measuring removing at front.
          Rate one_sub two_sub
one_sub 23.3/s      --    -36%
two_sub 36.6/s     57%      --
Measuring removing at tail.
          Rate one_sub two_sub
one_sub 23.8/s      --    -35%
two_sub 36.7/s     54%      --
Measuring removing at nothing.
          Rate one_sub two_sub
one_sub 22.9/s      --    -35%
two_sub 35.1/s     53%      --
Measuring removing at both_ends.
          Rate one_sub two_sub
one_sub 23.3/s      --    -37%
two_sub 36.7/s     58%      --

4 chars to remove, 12 chars long kept sequences, 10 repetitions.
Measuring removing at front.
          Rate one_sub two_sub
one_sub 24.2/s      --    -35%
two_sub 37.3/s     54%      --
Measuring removing at tail.
          Rate one_sub two_sub
one_sub 23.8/s      --    -37%
two_sub 37.9/s     59%      --
Measuring removing at nothing.
          Rate one_sub two_sub
one_sub 22.7/s      --    -35%
two_sub 34.9/s     54%      --
Measuring removing at both_ends.
          Rate one_sub two_sub
one_sub 24.2/s      --    -36%
two_sub 37.9/s     57%      --

4 chars to remove, 100 chars long kept sequences, 100 repetitions.
Measuring removing at front.
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
        s/iter one_sub two_sub
one_sub   2.14      --    -30%
two_sub   1.50     43%      --
Measuring removing at tail.
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
        s/iter one_sub two_sub
one_sub   2.18      --    -31%
two_sub   1.50     45%      --
Measuring removing at nothing.
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
        s/iter one_sub two_sub
one_sub   2.20      --    -31%
two_sub   1.52     45%      --
Measuring removing at both_ends.
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
        s/iter one_sub two_sub
one_sub   2.19      --    -31%
two_sub   1.50     46%      --

As you can see, the two-subst version is always faster. If you don’t believe me, run the thing through use re 'debug'; and watch what the engine is doing.

Makeshifts last the longest.

Comment on Re^4: Removing Flanking "N"s in a DNA String Select or Download Code

Replies are listed 'Best First'.
Re^5: Removing Flanking "N"s in a DNA String by BrowserUk (Patriarch) on Nov 07, 2005 at 16:25 UTC
There are two problems with your benchmark. The time taken to copy the data to modify within the code being benchmarked, drowns the minscule time spent do so. Your benchmark does not account for Benchmark.pms habit of preferencial biasing the tests in favour of the first case run. In the following, the only change I have made to your benchmark is to reverse the naming of the cases so tha they will be run in teh reverse order. Note how in most cases the "winner" is reversed, and in the few where this not the case, the differences are within the bounds of experimental error: Read more... (5 kB) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^6: Removing Flanking "N"s in a DNA String by Perl Mouse (Chaplain) on Nov 07, 2005 at 16:35 UTC
What do you mean "the winner is reversed"? Sure, the names are reversed, but that's to be expected as you reversed the names in the test to prove a point. Which I think, you didn't, as the test still favour the solution that uses two substitutions - just naming it one_sub doesn't change that. `Perl --((8:>*`	[reply]
Re^7: Removing Flanking "N"s in a DNA String by BrowserUk (Patriarch) on Nov 07, 2005 at 16:56 UTC
You're right. Dumb conclusion. I still stand by my benchmark though--until someone shows me what is wrong with it. By moving the test data generation outside of the test, and testing each method twice in reversed order, I believe that I have accounted for the Benchmark.PM "first case" bias, and am more accurately comparing the two methods than any of the other benchmarks posted. Can you dubunk that claim? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]