Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Making a match

by DrManhattan (Chaplain)
on Apr 12, 2003 at 13:30 UTC ( [id://250052]=note: print w/replies, xml ) Need Help??


in reply to Making a match

If you're checking $x aginst your list multiple times, it might be more efficient to use a hash:
my @list = (10,11,12,13,14,15,17,18,22,23,24,25,26,27,28,29,30,31,32,3 +3,34,35 ); my %hash; @hash{@list} = (1) x @list; # This solution only makes sense if you're # making the comparison multiple times foreach my $x (1..1000) { if ($hash{$x}) { print "$x is in the list\n"; } }
You spend some initial setup time creating the hash, but then all your comparisons are O(1).

-Matt

Replies are listed 'Best First'.
Re: Re: Making a match
by TheDamian (Vicar) on Apr 13, 2003 at 00:40 UTC
    Hashes certainly are the best solution, if you can amortize the set-up costs (and only slightly worse than grep even if you have to set up repeatedly -- i.e. if the list keeps changing).

    But if ultimate efficiency is the goal, then there is an even more efficient way to use hash lookup:

    my %hash; @hash{@list} = (); # Don't bother filling in values foreach my $x (1..1000) { if (exists $hash{$x}) # Just test for existence of key { print "$x is in the list\n"; } }
      ++ to you and perlplexer. I hadn't realized you could do it that way. Benchmark.pm shows defining the hash is over twice as fast as assigning values to it. The benchmark results on the comparison techniques are more interesting:
      if ($hash{$x} == 1) { # always slower than if($hash{$x}) # or if(defined($hash{$x})) } if ($hash{$x}) { # slower than if(defined($hash{$x})) # when $hash{$x} is undefined # faster than if(defined($hash{$x})) # when $hash{$x} is defined }
      Here's the benchmark code I used:
      #!/usr/bin/perl use strict; use Benchmark; my @list = (10,11,12,13,14,15,17,18,22,23,24,25,26,27,28,29,30,31,32,3 +3,34,35 ); my %hash; @hash{@list} = (1) x @list; timethese(1000000, { 'Hash Defined Miss' => sub { if (defined($hash{1})) { #nop } }, 'Hash Defined Hit' => sub { if (defined($hash{10})) { #nop } }, 'Hash Assigned Miss Truth' => sub { if ($hash{1}) { #nop } }, 'Hash Assigned Miss Equality' => sub { if ($hash{1} == 1) { #nop } }, 'Hash Assigned Hit Truth' => sub { if ($hash{10}) { #nop } }, 'Hash Assigned Hit Equality' => sub { if ($hash{10} == 1) { #nop } } });
      Benchmark: timing 1000000 iterations of Hash Assigned Hit Equality, Ha +sh Assigned Hit Truth, Hash Assigned Miss Equality, Hash Assigned Miss Truth, Hash Define +d Hit, Hash Defined Miss... Hash Assigned Hit Equality: 5 wallclock secs ( 1.52 usr + 0.00 sys = + 1.52 CPU) @ 657894.74/s (n=1000000) Hash Assigned Hit Truth: 3 wallclock secs ( 1.29 usr + -0.01 sys = 1 +.28 CPU) @ 781250.00/s (n=1000000) Hash Assigned Miss Equality: 3 wallclock secs ( 1.30 usr + 0.00 sys += 1.30 CPU) @ 769230.77/s (n=1000000) Hash Assigned Miss Truth: 1 wallclock secs ( 1.15 usr + 0.01 sys = +1.16 CPU) @ 862068.97/s (n=1000000) Hash Defined Hit: 1 wallclock secs ( 1.41 usr + 0.00 sys = 1.41 CPU +) @ 709219.86/s (n=1000000) Hash Defined Miss: 2 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CP +U) @ 1052631.58/s (n=1000000)

      -Matt

        (Sigh. I just can help myself, can I? Always with the advice... ;-)

        Two suggestions for using Benchmark:

        1. If you're using long names for each test, pad the names to the same length with extra spaces on the left. Then your results are much easier to read and compare:
          Hash Exists Hit: 4 wallclock secs @ 1204819.28/s (n=10000 +00) Hash Defined Hit: 3 wallclock secs @ 2173913.04/s (n=10000 +00) Hash Exists Miss: 1 wallclock secs @ 1515151.52/s (n=10000 +00) Hash Defined Miss: 3 wallclock secs @ 1492537.31/s (n=10000 +00) Hash Assigned Miss Truth: 6 wallclock secs @ 1515151.52/s (n=10000 +00) Hash Assigned Hit Equality: 4 wallclock secs @ 1149425.29/s (n=10000 +00) Hash Assigned Hit Truth: 5 wallclock secs @ 970873.79/s (n=100000 +0) Hash Assigned Miss Equality: 6 wallclock secs @ 877192.98/s (n=100000 +0)
        2. Use cmpthese instead of timethese. That adds a handy comparison matrix at the end of your test results (though this suggestion works much better with shorter names):
          Rate HDH HAHE HAME HEH HAMT HDM HAHT HEM HDH 1111111/s -- -0% -4% -21% -24% -28% -30% -39% HAHE 1111111/s 0% -- -4% -21% -24% -28% -30% -39% HAME 1162791/s 5% 5% -- -17% -21% -24% -27% -36% HEH 1408451/s 27% 27% 21% -- -4% -8% -11% -23% HAMT 1470588/s 32% 32% 26% 4% -- -4% -7% -19% HDM 1538462/s 38% 38% 32% 9% 5% -- -3% -15% HAHT 1587302/s 43% 43% 37% 13% 8% 3% -- -13% HEM 1818182/s 64% 64% 56% 29% 24% 18% 15% --

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://250052]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (1)
As of 2024-04-25 04:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found