Re: Re: Re: Re: Zip Code Script

Update: I can't reproduce these results, so ignore this post. As merlyn notes, a single search against an array is going to be more efficient by examining the array than building a hash. If you're going to be doing multiple searches, though, you're probably better off building a hash.

A quick Benchmark shows the spot on my system to be at 5 elements (comparing a linear scan versus building a hash):

 1   0 wallclock secs (-0.89 usr +  0.00 sys = -0.89 CPU)
 2   0 wallclock secs (-1.00 usr +  0.00 sys = -1.00 CPU)
 3  -1 wallclock secs (-0.67 usr +  0.00 sys = -0.67 CPU)
 4   1 wallclock secs (-0.35 usr +  0.01 sys = -0.34 CPU)
 5   1 wallclock secs (-0.05 usr +  0.01 sys = -0.04 CPU)
 6   0 wallclock secs ( 0.32 usr +  0.00 sys =  0.32 CPU)
 7   2 wallclock secs ( 0.68 usr +  0.00 sys =  0.68 CPU)
 8   2 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU)
 9   1 wallclock secs ( 1.33 usr +  0.00 sys =  1.33 CPU)
10   3 wallclock secs ( 1.70 usr +  0.00 sys =  1.70 CPU)
[download]

So yah, you're better off going with a linear scan unless you're going to be working with more than a handful of elements.

Comment on Re: Re: Re: Re: Zip Code Script Download Code

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Zip Code Script by merlyn (Sage) on Dec 13, 2000 at 01:13 UTC
Show your code. Please include the building of the hash in that code, since that was the cost I was looking at: one linear scan of an array, vs one hash lookup in a hash built from that same data. I'd be incredibly surprised if there was ever a time when building the hash would ever be faster. I mean, it goes back to my "how much info did you throw away at the end?" theory that I talked about here some time ago. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: Re: Re: Re: Zip Code Script by Fastolfe (Vicar) on Dec 13, 2000 at 01:25 UTC
I am building the hash in the test, but I left the construction of the array outside of it: use Benchmark qw{ timeit timediff timestr }; use strict; my @array; for (1 .. 10) { push(@array, $_); my $time = timeit(500000, "&search_array($_)"); my $time2 = timeit(500000, "&search_hash($_)"); printf("%2d %s\n", $_, timestr timediff($time, $time2)); } sub search_array { my $max = shift; my $item; my $looking = int(rand($max)); foreach $item (@array) { return 1 if $item == $looking; } return; } sub search_hash { my $max = shift; my %hash; @hash{@array} = (); return 1 if exists $hash{rand($max)}; return; } [download] Actually this sucks. I made some stuff more readable and my results now differ from what I was seeing earlier. I can't reproduce my old results. So yah, it looks like you're right. It's an increasing trend in favor of a linear search. My bad.	[reply] [d/l]