Benchmark doesn't tell me what I want to hear

afresh1 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks:

I am trying to benchmark some code to try to figure out where I need to optimize it. In the real script, I am using POE and recieving input from libbt's btget and filtering it through several regular expressions and other stuff to collect the information to do different things with it.

What I have done here is split up a single sub from the real script into different subs that contain the individual parts so I could track down what was slow and see as I make improvements. However, it seems that I can run this same test script 3 times and get 3 different results as to which sub is faster and which is slower, but they are always very close.

Am I just using Benchmark wrong, or what have I screwed up?

I have tried this on 2 different systems, one OpenBSD 3.6 with perl 5.8.5 and a Windows 2000 box with perl 5.8.0 (ActivePerl build 806) with similar results.

Benchmarks:

Benchmark: timing 1000000 iterations of i_noop, i_peers, i_sep...
    i_noop: 13.0096 wallclock secs (10.13 usr +  2.89 sys = 13.02 CPU)
+ @ 76804.92/s (n=1000000)
   i_peers: 13.0293 wallclock secs (10.17 usr +  2.85 sys = 13.02 CPU)
+ @ 76804.92/s (n=1000000)
     i_sep: 12.6381 wallclock secs ( 9.72 usr +  2.92 sys = 12.64 CPU)
+ @ 79113.92/s (n=1000000)
           Rate i_peers  i_noop   i_sep
i_peers 76805/s      --     -0%     -3%
i_noop  76805/s      0%      --     -3%
i_sep   79114/s      3%      3%      --

Benchmark: timing 1000000 iterations of i_noop, i_peers, i_sep...
    i_noop: 14.0463 wallclock secs (11.13 usr +  2.91 sys = 14.04 CPU)
+ @ 71225.07/s (n=1000000)
   i_peers: 14.6951 wallclock secs (11.36 usr +  3.34 sys = 14.70 CPU)
+ @ 68027.21/s (n=1000000)
     i_sep: 15.484 wallclock secs (12.20 usr +  3.29 sys = 15.49 CPU) 
+@ 64557.78/s (n=1000000)
           Rate   i_sep i_peers  i_noop
i_sep   64558/s      --     -5%     -9%
i_peers 68027/s      5%      --     -4%
i_noop  71225/s     10%      5%      --

This one is from the Windows box:
Benchmark: timing 1000000 iterations of i_noop, i_peers, i_sep...
    i_noop: 26 wallclock secs (15.97 usr +  7.00 sys = 22.97 CPU) @ 43
+536.94/s (n=1000000)
   i_peers: 26 wallclock secs (16.84 usr +  6.30 sys = 23.14 CPU) @ 43
+215.21/s (n=1000000)
     i_sep: 23 wallclock secs (16.16 usr +  6.61 sys = 22.76 CPU) @ 43
+927.08/s (n=1000000)
           Rate i_peers  i_noop   i_sep
i_peers 43215/s      --     -1%     -2%
i_noop  43537/s      1%      --     -1%
i_sep   43927/s      2%      1%      --
[download]

and code:

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw( timethese cmpthese :hireswallclock );

my $count = 1_000_000;

my $result = timethese($count, {
        'i_sep'   => sub { my %heap; while (<DATA>) { i_sep(  \%heap, 
+$_) } },
        'i_peers' => sub { my %heap; while (<DATA>) { i_peers(\%heap, 
+$_) } },
        'i_noop'  => sub { my %heap; while (<DATA>) { i_noop( \%heap, 
+$_) } },

});

cmpthese($result);

sub i_sep {
    my ($heap, $result) = @_;
    chomp $result;
    my $task = $heap->{task};

    if ($result eq '----') {
        # do nothing
    } else {
        #print $result, "\n";
    }
}

sub i_peers {
    my ($heap, $result) = @_;
    chomp $result;
    my $task = $heap->{task};

    if ($result =~
        /
        ^\s+(\d+)     # 6
        \s+([\d\.]+)  # 192.168.1.156
        :(\d+)        # :0
        \s+\((\w+)\)  # (ALL)
        \[\w+\s+\w+   # [Cibr 1s
        \(\s*(\w+)\)  # (122bps)
        \^-([\d\+]+)  # ^-001+0
        \]            # ]
        \[\w+\s+\w+^? # [Cibr 1s
        \(\s*(\w+)\)  # (122bps)
        _\d+          # _0
        \]            # ]
        /x
    ) {
        my $peer_id = $1;
        my %peer = (
            Time     => $task->{status}->{Time},
            peer_id  => $peer_id,
            ip       => $2,
            port     => $3,
            complete => $4,
            speed    => $5,
            offset   => $6,
            speed_2  => $7,
            offset_2 => $8,
        );
        $task->{peers}->{$peer_id} = \%peer;
    } else {
        #print $result, "\n";
    }
}


sub i_noop {
    my ($heap, $result) = @_;
}


__DATA__
Total good pieces 390 (100%)
Total archive size 102225004
completed+00000:ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
completed+00024:ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
completed+00048:fc

get http://OpenBSD.somedomain.net/announce.php?info_hash=%2b%d9%df%da%
+0a%14%d4%2
f%cc%b7%0c%af%39%b5%7d%ac%64%4a%05%d1&peer_id=%33%33%c8%98%29%80%d5%70
+%93%46%75%
ae%e2%3f%3b%7d%cb%78%71%4f&key=%ee%58%0f%93%9e%00%11%56&port=6881&uplo
+aded=0&dow
nloaded=0&left=0&event=started&compact=1
Interval 1800
3: Server ready...
5: New peer connected 192.168.1.156
Time 1115237207
 4  192.168.1.156:6881 (OUT)[Cibr    1s( 20bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 1 Peers, Download 0bps Upload 0bps^M
----
Time 1115237208
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237209
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237210
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237211
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237212
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237213
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237214
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237215
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
Time 1115237216
 4  192.168.1.156:6881 (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
 5  192.168.1.156:0    (ALL)[Cibr    1s(122bps)^-001+0][Cibr    1s(122
+bps)_0]
100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M
----
[download]

In the real script, the stuff between the 'Time' line and the '----' line just keeps repeating (sometimes with more peers, depending on who is connected, but that isn't really important for this discussion). With 67 torrents seeding, at 5 lines each per second, that is at least 335 lines per second, and the real script uses about 25% cpu on a celeron 700 (OpenBSD 3.6 - perl, v5.8.5 built for i386-openbsd)

--
andrew

Edit by tye: Changed PRE to CODE around long lines

Comment on Benchmark doesn't tell me what I want to hear Select or Download Code

Replies are listed 'Best First'.
Re: Benchmark doesn't tell me what I want to hear by ikegami (Patriarch) on May 04, 2005 at 23:07 UTC
The first run of the first test will suck `<DATA>` dry. Every other run of that test and every run of the other tests will never call the function being benchmarked. Try: `my @data = <DATA>; my $result = timethese($count, { i_sep => sub { my %heap; foreach (@data) { i_sep( \%heap, $_) } }, i_peers => sub { my %heap; foreach (@data) { i_peers(\%heap, $_) } }, i_noop => sub { my %heap; foreach (@data) { i_noop( \%heap, $_) } }, });` [download] It's good to check the result of tests by adding debug statememnts that will be removed when the tests are known to work.	[reply] [d/l] [select]
Re^2: Benchmark doesn't tell me what I want to hear by afresh1 (Hermit) on May 04, 2005 at 23:24 UTC
Yup, that was it. Thank you very much! -- andrew	[reply]
Re: Benchmark doesn't tell me what I want to hear by eibwen (Friar) on May 04, 2005 at 23:14 UTC
If you're trying to locate the bottleneck in your code, try using Devel::DProf. While your Benchmark results may be informative, they must be considered within the context of your code, with particular emphasis on the relative number of iterations.	[reply]
Re^2: Benchmark doesn't tell me what I want to hear by afresh1 (Hermit) on May 04, 2005 at 23:30 UTC
I did try that, but ran into issues with it. I would guess it is because I am using POE and forking and such. I may pull out the sub I know it spends most of its time in into a test script and try to profile just that. $ perl -d:DProf SeedTorrents.pl Memory fault (core dumped) -- andrew	[reply]
Re: Benchmark doesn't tell me what I want to hear by graff (Chancellor) on May 04, 2005 at 23:29 UTC
If I understand what your saying, I think you'll get better mileage if you start with profiling instead of benchmarking: Benchmarking is best for deciding whether two or more ways of doing the same thing are significantly different in how long they take to do that thing -- it's the tool you use when you believe that a current implementation is taking too long and an alternate method should be quicker, because benchmarking will tell you whether your belief is justified. Profiling is best for dissecting the performance of a functioning script, in order to analyze where most of the time is spent; it's the tool you use to find out which segment(s) of your code (if any) will make the likeliest target(s) for optimization, because it doesn't make sense to optimize parts that are already pretty fast. Check "perldoc Devel::DProf" for more info on how to do profiling. So if you do the profiling first, you'll have a better idea of which portions of code might need to be done some other way, and then you can benchmark the original and alternative methods for these portions to see whether the optimization works.	[reply]