http://qs1969.pair.com?node_id=309111


in reply to A (memory) poor man's <strike>hash</strike> lookup table.

There appears to be a bug in your code which is masked by the data you chose and the test you made. Your search only matches the start of the key, when it should match the end as well. For example with your data, a search for the key 12345 will look in bin 1234 for a string matching $/.'12345', but it could find a string like 1234567 and still return a match.

Also, for interest sake, you can reduce the memory footprint further by stripping out the bucket index from each of the keys. Using the above example again, if we wanted to store 1234567, we can just store 567 in the bucket keyed by 1234. The trivial case of storing 1234 will be stored as an empty string, which will be found correctly by searching for $/$/.

The following code reduced the footprint by another couple of megs. This reduction should also speed up the searching somewhat.

#!/usr/bin/perl $hash{ substr $_, 0, 4 } .= $/ . substr($_, 4) for 1 .. 1000000; $hash{$_} .= $/ foreach keys %hash; $n=0; print time, $/; for (1 .. 1000000) { next unless exists $hash{ substr $_, 0, 4 } and 1+index( $hash{ substr $_, 0, 4 }, $/.substr($_, 4).$/ ); $n++; } print time, $/, $n;

Other than that, it is an interesting concept that I can't see myself ever using :)

- Cees

Replies are listed 'Best First'.
Re: Re: A (memory) poor man's <strike>hash</strike> lookup table.
by BrowserUk (Patriarch) on Nov 22, 2003 at 10:01 UTC

    Indeed you are correct, my original code is flawed as you describe. Thanks for pointing out my error Cees++

    I've re-run a subset of the tests using the following slightly modfied version which corrects this error and the correction slows the insertion time by approximately 1 second for the million keys. The iteration time comes out exactly the same.

    print time; $hash{ substr $_, 0, 4 } .= $/. $_ for 1 .. 1_000_000; $hash{$_} .= $/ for keys %hash; # Append a final separator print time; 1069484550 1069484566 # 16 seconds. $n=0; print time; for ( 1 .. 1_000_000 ) { # Ensure no partial matches by prepending and appending # separators to the search key. next unless exists $hash{ substr $_, 0, 4 } and 1+index( $hash{ substr $_, 0, 4 }, $/.$_.$/ ); $n++; } print time, $/, $n; 1069484612 1069484656 # 44 secs. 1000000

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!