speeding up a hash lookup

chinman has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,

Let me first say that this site rocks! I'm a Perl newbie and have found the information here to be extremely useful in learning Perl.

Right, then! I've got a hash of x,y data (mass = keys,intensities = values). In my script, I search for a mass key that represents the largest intensity within a window of +- 0.3 mass units. I've populated my hash with mass data to the nearest 0.1 units. Is there anyway to extract the keys matching the tolerance without the for loop. I'm doing lots of these calls in my script, so any speed gained here would be helpful. Cheers!

#subroutine that gets the largest intensity of the mass ($_[0]
#within a mass tolerance window of +- 0.3.

sub getIntensity {
   my $mLo = $_[0] – 0.3;
   my $mHi = $_[0] + 0.3;
   my $intensity = 0;
   for ( my $i = $mLo; $i <= $mHi; $i += 0.1 ) {
       $i = sprintf("%.1f", $i);
       if ( exists $massint{ $i } ){
           if ( $massint{ $i } > $intensity ){
               $intensity = $massint{ $i }
           }
        }
   }
   return $intensity;
}
[download]

Comment on speeding up a hash lookup Download Code

Replies are listed 'Best First'.
Re: speeding up a hash lookup by arturo (Vicar) on Mar 27, 2001 at 20:50 UTC
I'm not sure it would be faster, but you can use the nifty `grep` operator in conjunction with the `keys` function to achieve similar effects: `sub getIntensity { my $target = shift; # => $_[0], idiomatic my $tol = .3; my @matching_keys = grep { abs($target-$_) <= $tol } keys %massint; my $intensity = 0; foreach (@matching_keys) { $intensity = $massint{$_} > $intensity ? $massint{$_} : $intensity; } $intensity; }` [download] First, you grab the keys within the tolerance range, and then you go through them to nab the greatest value (the ternary operator EXPR ? EXPR1 : EXPR2 returns EXPR1 if EXPR is true, and EXPR2 otherwise). You could, of course, do away with `@matching_keys` and run the foreach over the grep statement, but that doesn't illustrate the technique as clearly. HTH Philosophy can be made out of anything. Or less -- Jerry A. Fodor	[reply] [d/l] [select]
Re: speeding up a hash lookup by japhy (Canon) on Mar 28, 2001 at 01:31 UTC
I created a caching algorithm for your data set. I'll reproduce it here, with comments. It works like so. It determines the max value over the +/- 0.3 window for each mass in the range LOW - 0.3 to HIGH + 0.3, and it also remembers WHICH mass it found that intensity at. This way, when it gets to the next number, it knows where to start looking -- there's no need to look BEFORE the highest value for a higher value, since it would've been registered as the highest value. my $cache = cacheMaxIntensity(\%massint); sub cacheMaxIntensity { my $mi = shift; my ($min, $max); # determine the maximum and minimum masses # in the hash while (my ($m) = each %$mi) { $min = $m if not defined($min) or $min > $m; $max = $m if not defined($max) or $max < $m; } # the "latest mass" is defined as the mass within a # +/- 0.3 range with the highest intensity my $latest = $min - 0.3; my %cache = (); # cycle through all the masses, including # 3 less than the min and 3 greater than the max for (my $m = $min - 0.3; $m <= $max + 0.3; $m += 0.1) { # ensure "ABC.D" format (icky non-integers) $m = sprintf("%.1f", $m); # update the value of $latest if it's too # far away from the current mass $latest = sprintf("%.1f", $m - 0.3) if $m - 0.3 > $latest; # build the range (latest .. current + 0.3) # of other masses to examine for my $d ( map sprintf("%.1f", $_ / 10), (10 * $latest) .. (10 * $m + 3) ) { # set the cached intensity for this mass # to the intensity for $d IF: # there is an intensity for $d # AND # there is no cached intensity yet # OR # the intensity for $d is greater than # the cached value $cache{$m} = [ $mi->{$d}, $latest = $d ] if exists $mi->{$d} and ( not($cache{$m}) or $cache{$m}[0] < $mi->{$d} ); } } # return a reference to the cache return \%cache; } [download] The corollary function, to get the maximum intensity in a range of +/- 0.3 mass, is rather simple: `my $max_I = maxIntensity($cache, $mass); sub maxIntensity { my ($c, $m) = @_; return $c->{sprintf "%.1f", $m}[0]; }` [download] I hope this helps. It took a bit of time for me to write the caching function, but it appears to hold up to my testing. `japhy` -- Perl and Regex Hacker	[reply] [d/l] [select]
Re: speeding up a hash lookup by jeroenes (Priest) on Mar 27, 2001 at 21:13 UTC
Well, it depends a little on how many keys you have, but I have had good experiences with BerkeleyDB {1}. For 2k keys, I experienced a 40 times increase in speed (for the look-up, didn't test the insert) with the Berkeley BTree hash in comparison to perl's hash lookup. Hope this helps, Jeroen "We are not alone"(FZ) {1} I have plugged this one quite often the last few months, but it is really a great tool. You can read about my problems at the time here.	[reply]
Re: speeding up a hash lookup by physi (Friar) on Mar 27, 2001 at 21:00 UTC
I don't know if this is really faster, but a bit shorter: `sub getIntensity { my $mLo = $_[0] – 0.3; my $mHi = $_[0] + 0.3; my $intensity = 0; map { if ($_ >= $mLo && $_ <= $mHi) { $intesity = $massint{$_} > $intensity ? $massint{$_} : $inten +sity} } keys %massint; return $intensity; }` [download] hope it help's :^)	[reply] [d/l]
Re: speeding up a hash lookup by jbert (Priest) on Mar 27, 2001 at 22:19 UTC
Depending on your application, you can trade memory for speed. So if all you are really interested in is the information you extract above, you can pre-calculate and store the results. Or even just cache results like this if you will call repeatably... `my %maxIntesityCache; sub getIntensity { my $m = shift; # Do we know this already? return $maxIntesityCache{$m} if defined $maxIntesityCache{$m}; ...as for your function, except you can probably change the core to remove the exists test and/or one of the other suggestions in this thread... # Remember this for later $maxIntensityCache{$m} = $intensity; return $intensity; }` [download] Precalculation may give you a win because you can run through the keys once, keeping a window from -0.3 to +0.3 open on your key at all times. Another thought...if all your numeric work is to precision '0.1' you might get a nice speedup by having your key as 'mass*10' so you are only storing integers. You can then: avoid the sprintf you are using to control precision issues, which must hurt you now I come to think of it you can switch perl into 'use integer' mode which might help too. Good luck...	[reply] [d/l]
Re: speeding up a hash lookup by MeowChow (Vicar) on Mar 27, 2001 at 22:09 UTC
A hash is a rather poor fit to this task. You should consider using a search tree of some sort. MeowChow s aamecha.s a..a\u$&owag.print	[reply]
Re: speeding up a hash lookup by chinman (Monk) on Apr 01, 2001 at 03:00 UTC
Thanks to everyone for your suggestions! As expected, the cache routine (thanks japhy!) turned out faster when making lots of calls to get intensity. When only making a few calls, my original code is faster than the other suggestions (by typically 10X). Thanks again... `#!/usr/bin/perl -w $camel = $hump do { theHumpty($camel); }` [download]	[reply] [d/l]

Back to Seekers of Perl Wisdom