in reply to Re: the if statement
in thread the if statement

  • Is there any efficiency advantage in either approach?
  • Does the hash size make a difference?
  • How about when done 5000 times for different $input values?

I personally believe that there's no significative advantage of one approach over the other but possibly in terms of personal tastes. As far as efficiency is concerned, what do you mean? Speed of execution? If so, then I wouldn't mind, since it's such a tiny difference, but you may answer your question(s) yourself with Benchmark.pm!

I notoriously suck at benchmarks, doing continuous errors, (which are generally pointed out by others...) but here's my try anyway:

#!/usr/bin/perl use strict; use warnings; # use 5.010; use Benchmark qw/cmpthese :hireswallclock/; { my @chr=('a'..'z', 'A'..'Z'); sub genkeys { map { join '' => map $chr[rand @chr], (1) x (5 + rand 10); } 1..shift; } } my %hash=map {$_ => 1} genkeys 5000; my @test=(keys %hash, genkeys 5000); { $|++; local $\="\n"; open my $fh, '>', '/dev/null' or die "Can't open /dev/null: $!\n"; cmpthese -60 => { assign => sub () { for my $input (@test) { if (my $ans = $hash{$input}) { print $fh "$input => $ans"; } } }, double => sub () { for my $input (@test) { if (exists $hash{$input}) { print $fh $hash{$input}; } } } }; } __END__

As I expected, as it is it doesn't show any noticeable difference.

kirk:~ [15:16:04]$ ./bm.pl Rate double assign double 29.4/s -- -1% assign 29.7/s 1% --

Indeed I generally refrain from the temptation of doing benchmarks "like this" when someone suggests them, and even tend to slightly bash those who do: this time I was curious to see if at least a tiny systematic difference would have arisen, but that doesn't seem to be the case: feel free to modify it the way you like most though!

--
If you can't understand the incipit, then please check the IPB Campaign.

Replies are listed 'Best First'.
Re^3: the if statement
by Wiggins (Hermit) on Sep 29, 2008 at 18:37 UTC
    I love your answer. Below I have put my result of running your benchmark in a Fedora9 VM on a Lenovo laptop.

    But what really intrigues me is how you built the test hash and keys to test with... 2 'map's in 2 lines of code. Figuring out 'genkeys' and the %hash, and @test values will take me the rest of the afternoon; thanks

    ### using 5.010 [~]# time perl hash_test_benchmark.pl Rate assign double assign 119/s -- -2% double 122/s 2% -- real 2m30.601s user 2m17.789s sys 0m6.317s ### without 5.010 [~]# time perl hash_test_benchmark.pl Rate double assign double 121/s -- -0% assign 122/s 0% -- real 2m32.598s user 2m19.745s sys 0m6.498s [~]#
      I love your answer. Below I have put my result of running your benchmark in a Fedora9 VM on a Lenovo laptop.

      I personally believe this just shows that the benchmark itself is not significative, or that it is significative in showing that there's no significative difference between the two "techniques" and thus also as a reminder not to even bother in the future: just do so when you have actually different algorithms to start with...

      You may find much more interesting benchmarks in another recent thread...

      But what really intrigues me is how you built the test hash and keys to test with... 2 'map's in 2 lines of code. Figuring out 'genkeys' and the %hash, and @test values will take me the rest of the afternoon; thanks

      What's so difficult to understand? I hope I can help you to clarify: %hash and %test are a plain regular hash and array respectively. Since they're lexical variables, the subs used in the benchmark will be closures around them.

      genkeys() takes a whole number $n and returns that many random strings, of length comprised in an hardcoded manner between 5 and 14. Since genkeys() makes no attempt at removing duplicate entries from its return list, %hash is a hash with at most 5000 keys, but it may have less. @test has all these keys, plus other 5000, and it may have duplicates. I wanted a test array of "input" values such that about half of them values will succeed and about a half will fail.

      Coming to genkeys(), analyze it top-down; it's simply of the form

      sub genkeys { map { CODE } 1..shift; }

      with CODE being:

      join '' => map $chr[rand @chr], (1) x (5 + rand 10);

      The former takes a list of the length of the supplied argument and to each element of it will apply CODE. Since $_ is not actually used in CODE, the actual values of the elements don't matter, only the length of the list, and it may well have been e.g. (1) x shift. In the latter, similarly, I build a list of arbritrary thingies of length between 5 and 14. Then map makes that into a list of length between 5 and 14 of random characters taken from the @chr array and join... err... well, joins them into a string of that length. As you can see, it's not that esoteric after all...

      --
      If you can't understand the incipit, then please check the IPB Campaign.