afresh1 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks:

I am trying to benchmark some code to try to figure out where I need to optimize it. In the real script, I am using POE and recieving input from libbt's btget and filtering it through several regular expressions and other stuff to collect the information to do different things with it.

What I have done here is split up a single sub from the real script into different subs that contain the individual parts so I could track down what was slow and see as I make improvements. However, it seems that I can run this same test script 3 times and get 3 different results as to which sub is faster and which is slower, but they are always very close.

Am I just using Benchmark wrong, or what have I screwed up?

I have tried this on 2 different systems, one OpenBSD 3.6 with perl 5.8.5 and a Windows 2000 box with perl 5.8.0 (ActivePerl build 806) with similar results.

Benchmarks:

Benchmark: timing 1000000 iterations of i_noop, i_peers, i_sep... i_noop: 13.0096 wallclock secs (10.13 usr + 2.89 sys = 13.02 CPU) + @ 76804.92/s (n=1000000) i_peers: 13.0293 wallclock secs (10.17 usr + 2.85 sys = 13.02 CPU) + @ 76804.92/s (n=1000000) i_sep: 12.6381 wallclock secs ( 9.72 usr + 2.92 sys = 12.64 CPU) + @ 79113.92/s (n=1000000) Rate i_peers i_noop i_sep i_peers 76805/s -- -0% -3% i_noop 76805/s 0% -- -3% i_sep 79114/s 3% 3% -- Benchmark: timing 1000000 iterations of i_noop, i_peers, i_sep... i_noop: 14.0463 wallclock secs (11.13 usr + 2.91 sys = 14.04 CPU) + @ 71225.07/s (n=1000000) i_peers: 14.6951 wallclock secs (11.36 usr + 3.34 sys = 14.70 CPU) + @ 68027.21/s (n=1000000) i_sep: 15.484 wallclock secs (12.20 usr + 3.29 sys = 15.49 CPU) +@ 64557.78/s (n=1000000) Rate i_sep i_peers i_noop i_sep 64558/s -- -5% -9% i_peers 68027/s 5% -- -4% i_noop 71225/s 10% 5% -- This one is from the Windows box: Benchmark: timing 1000000 iterations of i_noop, i_peers, i_sep... i_noop: 26 wallclock secs (15.97 usr + 7.00 sys = 22.97 CPU) @ 43 +536.94/s (n=1000000) i_peers: 26 wallclock secs (16.84 usr + 6.30 sys = 23.14 CPU) @ 43 +215.21/s (n=1000000) i_sep: 23 wallclock secs (16.16 usr + 6.61 sys = 22.76 CPU) @ 43 +927.08/s (n=1000000) Rate i_peers i_noop i_sep i_peers 43215/s -- -1% -2% i_noop 43537/s 1% -- -1% i_sep 43927/s 2% 1% --
and code:
#!/usr/bin/perl use strict; use warnings; use Benchmark qw( timethese cmpthese :hireswallclock ); my $count = 1_000_000; my $result = timethese($count, { 'i_sep' => sub { my %heap; while (<DATA>) { i_sep( \%heap, +$_) } }, 'i_peers' => sub { my %heap; while (<DATA>) { i_peers(\%heap, +$_) } }, 'i_noop' => sub { my %heap; while (<DATA>) { i_noop( \%heap, +$_) } }, }); cmpthese($result); sub i_sep { my ($heap, $result) = @_; chomp $result; my $task = $heap->{task}; if ($result eq '----') { # do nothing } else { #print $result, "\n"; } } sub i_peers { my ($heap, $result) = @_; chomp $result; my $task = $heap->{task}; if ($result =~ / ^\s+(\d+) # 6 \s+([\d\.]+) # 192.168.1.156 :(\d+) # :0 \s+\((\w+)\) # (ALL) \[\w+\s+\w+ # [Cibr 1s \(\s*(\w+)\) # (122bps) \^-([\d\+]+) # ^-001+0 \] # ] \[\w+\s+\w+^? # [Cibr 1s \(\s*(\w+)\) # (122bps) _\d+ # _0 \] # ] /x ) { my $peer_id = $1; my %peer = ( Time => $task->{status}->{Time}, peer_id => $peer_id, ip => $2, port => $3, complete => $4, speed => $5, offset => $6, speed_2 => $7, offset_2 => $8, ); $task->{peers}->{$peer_id} = \%peer; } else { #print $result, "\n"; } } sub i_noop { my ($heap, $result) = @_; } __DATA__ Total good pieces 390 (100%) Total archive size 102225004 completed+00000:ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff completed+00024:ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff completed+00048:fc get http://OpenBSD.somedomain.net/announce.php?info_hash=%2b%d9%df%da% +0a%14%d4%2 f%cc%b7%0c%af%39%b5%7d%ac%64%4a%05%d1&peer_id=%33%33%c8%98%29%80%d5%70 +%93%46%75% ae%e2%3f%3b%7d%cb%78%71%4f&key=%ee%58%0f%93%9e%00%11%56&port=6881&uplo +aded=0&dow nloaded=0&left=0&event=started&compact=1 Interval 1800 3: Server ready... 5: New peer connected 192.168.1.156 Time 1115237207 4 192.168.1.156:6881 (OUT)[Cibr 1s( 20bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 1 Peers, Download 0bps Upload 0bps^M ---- Time 1115237208 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237209 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237210 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237211 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237212 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237213 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237214 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237215 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ---- Time 1115237216 4 192.168.1.156:6881 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 5 192.168.1.156:0 (ALL)[Cibr 1s(122bps)^-001+0][Cibr 1s(122 +bps)_0] 100% (390 of 390) 2 Peers, Download 0bps Upload 0bps^M ----

In the real script, the stuff between the 'Time' line and the '----' line just keeps repeating (sometimes with more peers, depending on who is connected, but that isn't really important for this discussion). With 67 torrents seeding, at 5 lines each per second, that is at least 335 lines per second, and the real script uses about 25% cpu on a celeron 700 (OpenBSD 3.6 - perl, v5.8.5 built for i386-openbsd)

--
andrew

Edit by tye: Changed PRE to CODE around long lines

Replies are listed 'Best First'.
Re: Benchmark doesn't tell me what I want to hear
by ikegami (Patriarch) on May 04, 2005 at 23:07 UTC

    The first run of the first test will suck <DATA> dry. Every other run of that test and every run of the other tests will never call the function being benchmarked. Try:

    my @data = <DATA>; my $result = timethese($count, { i_sep => sub { my %heap; foreach (@data) { i_sep( \%heap, $_) } }, i_peers => sub { my %heap; foreach (@data) { i_peers(\%heap, $_) } }, i_noop => sub { my %heap; foreach (@data) { i_noop( \%heap, $_) } }, });

    It's good to check the result of tests by adding debug statememnts that will be removed when the tests are known to work.

      Yup, that was it. Thank you very much!
      --
      andrew
Re: Benchmark doesn't tell me what I want to hear
by eibwen (Friar) on May 04, 2005 at 23:14 UTC

    If you're trying to locate the bottleneck in your code, try using Devel::DProf.

    While your Benchmark results may be informative, they must be considered within the context of your code, with particular emphasis on the relative number of iterations.

      I did try that, but ran into issues with it. I would guess it is because I am using POE and forking and such. I may pull out the sub I know it spends most of its time in into a test script and try to profile just that.

      $ perl -d:DProf SeedTorrents.pl
      Memory fault (core dumped)
      
      --
      andrew
Re: Benchmark doesn't tell me what I want to hear
by graff (Chancellor) on May 04, 2005 at 23:29 UTC
    If I understand what your saying, I think you'll get better mileage if you start with profiling instead of benchmarking:
    • Benchmarking is best for deciding whether two or more ways of doing the same thing are significantly different in how long they take to do that thing -- it's the tool you use when you believe that a current implementation is taking too long and an alternate method should be quicker, because benchmarking will tell you whether your belief is justified.
    • Profiling is best for dissecting the performance of a functioning script, in order to analyze where most of the time is spent; it's the tool you use to find out which segment(s) of your code (if any) will make the likeliest target(s) for optimization, because it doesn't make sense to optimize parts that are already pretty fast. Check "perldoc Devel::DProf" for more info on how to do profiling.

    So if you do the profiling first, you'll have a better idea of which portions of code might need to be done some other way, and then you can benchmark the original and alternative methods for these portions to see whether the optimization works.