in reply to constructing large hashes

You may (or may not) find enlightenment here.

Perl's hashing algorithm may be coming up with the same hash for each key. After the hash is constructed, try print scalar %arrangements; to see how many buckets are used/allocated for the hash.

Update: Take a look at this:

#!/usr/bin/perl use strict; use warnings; use Algorithm::FastPermute; #uncomment the following line and comment out the next to see the diff #my @array = (0 .. 7); my @array = (100_000 .. 100_007); my ($key, %perm); permute {$key = join '',@array;$perm{$key} = 1} @array; print scalar %perm;

Using 0..7, the code runs very quickly and uses 3704 out of 8192 buckets for 40_320 values. However, using 100_000..100_007 it uses 4 out of 8 buckets (for the same number of values) and runs very slowly - effectively turning perl's fast hashing algorithm into an expensive linear search.

Replies are listed 'Best First'.
Re: Re: constructing large hashes
by LEFant (Scribe) on Oct 01, 2002 at 01:36 UTC

    I agree that determining that the set of keys is not perverse is good first step in understanding poor hash performance. We assume that you asked the more important question...is the hash is the most effective way to the solve your problem. After that, some degree of optimization can be obtained by pre-sizing the hash. Perl may be reorganizing the hash at one or more points in populating it.

    If you can calculate the number of keys Perl provides a way for you to reveal the number of keys you will insert. Use keys(%hash_name) = number to provide that estimate. You may want to use a number smaller or larger than the number of keys depending on how Perl uses this hint. (Read the source or experiment?)

      ++LEFant Cool! At first I didn't think pre-sizing the hash would make any difference in the number of buckets or how they were used - but it sure does.
Re: Re: constructing large hashes
by blssu (Pilgrim) on Sep 30, 2002 at 21:55 UTC

    That's an interesting thought -- even though all the hash keys are different, there may be many collisions because the number of different characters is small. Does anybody know if perl's hash function is sensitive to collisions on permutations?

    For example, will things like "123", "213" and "312" hash far apart? If the character position is completely ignored, those will collide. I'm certain perl is not that bad, but 8 character permutations put a lot of stress on the hash function.

      I've used a hash with around 26 million key/value pairs in the past.
      The keys were incrementing digits, so there would have been many cases similar to your 123, 213, 312 scenario, but it didn't appear to cause any problems.

      That was Perl-5.6.1 on Solaris.

Re: Re: constructing large hashes
by duelafn (Parson) on Oct 01, 2002 at 00:20 UTC

    Ah Ha! Of course. I guess that I should have thought of that. Anyway, after trying several different separators (',', ' ', '_', '-', '.', '0') and even trying padding with zeroes (01, 02, ..., 08), the only solution that works well is no separator which makes it more difficult when we move to permutations of 10 or more things. Although from blssu's results below it looks as though some of this was fixed in perl 5.8 (I'm still limping along at 5.6.1).

    Thanks!
        Dean


    If we didn't reinvent the wheel, we wouldn't have rollerblades.

      I had another thought on the way home from work. You don't need the temp array, just build the hash in the code block for permute!

      permute { $temp = join(',',@n); $arrangement{$temp} = undef;} @n; # create the hash here --^^^^

      That should save some memory - and on my system, about a minute. As far as the separators, maybe pack the permutation into the key?

      permute {  $arrangements{pack "C*", @n} =1 } @n;

      is almost instantaneous. Won't help with 10 or more...

      Ah Ha! Of course. I guess that I should have thought of that. Anyway, after trying several different separators (',', ' ', '_', '-', '.', '0') and even trying padding with zeroes (01, 02, ..., 08), the only solution that works well is no separator which makes it more difficult when we move to permutations of 10 or more things. Although from blssu's results below it looks as though some of this was fixed in perl 5.8 (I'm still limping along at 5.6.1).
      As I'm now (attempting to) learn C, maybe this is obvious to me because it's a common idiom in C: use letters instead. You can use the ord function to get a mapping between the letters and the numbers, and you can even normalize the set [a-z] -> [0-25] by subtracting ord("a"). Also, you wouldn't have to change much else (from what I can see, anyways). For instance:
      $n = 8; @n = (1..$n);
      becomes
      $n = ord("a")+8; @n = ("a"..$n);
      Of course, this all assumes that you don't do many explicit aritmetic calculations on your set of permutations. If you do, the repeated calls to ord and char my be prohibitive.

      thor