in reply to Re^5: RFC: Is there a solution to the flaw in my hash mechanism? (p-1?)
in thread RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)

The point of the outer loop:

for( my $i=0; $i<17; ++$i ) { my $j = 1 + ( $i % ( 17 - 1 ) ); printf "%2u: %s\n", $i, join' ', map{ sprintf "%2u", $j = ( $j + $ +i ) % 17 } 0 .. 16; };;

with $i running 0 .. 16, is that $i takes on all possible values after the first mod prime. Ie. $i is the initial insertion point.

In the original examples, $j was simply a copy of $i, so that it can retain the value of the initial insert point, for use when $j has been modified: i) for the row label; ii) because it is re-used in the calculation of the next insertion point: $j = ( $j + $i ) % 17 in the retry loop.

What you've done by re-casting the code (which certainly works BTW), is effectively the same as this (with an offset) :

for( my $i=0; $i < 17; ++$i ) { my $j = $i ||= 1; printf "%2u: %s\n", $i, join' ', map{ sprintf "%2u", $j = ( $j + $ +i ) % 17 } 0 .. 16; };; 1: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2: 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15 0 2 3: 6 9 12 15 1 4 7 10 13 16 2 5 8 11 14 0 3 4: 8 12 16 3 7 11 15 2 6 10 14 1 5 9 13 0 4 5: 10 15 3 8 13 1 6 11 16 4 9 14 2 7 12 0 5 6: 12 1 7 13 2 8 14 3 9 15 4 10 16 5 11 0 6 7: 14 4 11 1 8 15 5 12 2 9 16 6 13 3 10 0 7 8: 16 7 15 6 14 5 13 4 12 3 11 2 10 1 9 0 8 9: 1 10 2 11 3 12 4 13 5 14 6 15 7 16 8 0 9 10: 3 13 6 16 9 2 12 5 15 8 1 11 4 14 7 0 10 11: 5 16 10 4 15 9 3 14 8 2 13 7 1 12 6 0 11 12: 7 2 14 9 4 16 11 6 1 13 8 3 15 10 5 0 12 13: 9 5 1 14 10 6 2 15 11 7 3 16 12 8 4 0 13 14: 11 8 5 2 16 13 10 7 4 1 15 12 9 6 3 0 14 15: 13 11 9 7 5 3 1 16 14 12 10 8 6 4 2 0 15 16: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 16

Which is intriguing.

I'd almost settled on having a separate (dynamic) spill array to hold (probably rare (0.0000005%)) values that are 0 congruent to the prime and doing a linear search to check if they are already there.

One upside of that idea is that it means I can use 0 in the main array to mean not allocated. If I don't go that route, I have to find another value to represent an empty slot; which would basically mean either reducing the possible range of inputs (exclude 2**64-1 for example), or use the spill array for that other value.

If I'm going to have to have a spill array anyway, it might as well be for the 0s; but now looking at the pattern of the insertions your code presents, I'm torn.

Thank you. And sorry I misunderstood you.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^7: RFC: Is there a solution to the flaw in my hash mechanism? ($o != $i)
by tye (Sage) on Jun 03, 2015 at 06:38 UTC
    And sorry I misunderstood you.

    No problem. My concern was just that my contribution might be helpful but for some extra explanation. No offense taken.

    is effectively the same as ... $i ||= 1

    Except that it distributes rather more evenly.

    which would basically mean either reducing the possible range of inputs (exclude 2**64-1 for example), or use the spill array for that other value

    I'm not convinced that either of those is required. But I'll refrain from trying to make a case stronger than that at least at this point. But it isn't hard to adjust the approach if using all but one (particular) slot until the hash is completely filled is somehow unacceptable.

    Actually, a nearly trivial adjustment has what can be a significant advantage in that it can reduce the collisions because different hash values that start at the same insertion point will likely follow different paths for subsequent insertion points.

    my $p = 17; for my $hash ( 0 .. 20 ) { my $i = $hash % $p; my $o = 1 + $hash % ($p-1); my $j = $i; printf "%2u: %2u %s\n", $hash, $i, join ' ', map { sprintf "%2u", $j = ( $j + $o ) % $p } 0 .. 16; } __END__ hash 1st 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1: 1 3 5 7 9 11 13 15 0 2 4 6 8 10 12 14 16 1 2: 2 5 8 11 14 0 3 6 9 12 15 1 4 7 10 13 16 2 3: 3 7 11 15 2 6 10 14 1 5 9 13 0 4 8 12 16 3 4: 4 9 14 2 7 12 0 5 10 15 3 8 13 1 6 11 16 4 5: 5 11 0 6 12 1 7 13 2 8 14 3 9 15 4 10 16 5 6: 6 13 3 10 0 7 14 4 11 1 8 15 5 12 2 9 16 6 7: 7 15 6 14 5 13 4 12 3 11 2 10 1 9 0 8 16 7 8: 8 0 9 1 10 2 11 3 12 4 13 5 14 6 15 7 16 8 9: 9 2 12 5 15 8 1 11 4 14 7 0 10 3 13 6 16 9 10: 10 4 15 9 3 14 8 2 13 7 1 12 6 0 11 5 16 10 11: 11 6 1 13 8 3 15 10 5 0 12 7 2 14 9 4 16 11 12: 12 8 4 0 13 9 5 1 14 10 6 2 15 11 7 3 16 12 13: 13 10 7 4 1 15 12 9 6 3 0 14 11 8 5 2 16 13 14: 14 12 10 8 6 4 2 0 15 13 11 9 7 5 3 1 16 14 15: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 16 15 16: 16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17: 0 2 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15 0 18: 1 4 7 10 13 16 2 5 8 11 14 0 3 6 9 12 15 1 19: 2 6 10 14 1 5 9 13 0 4 8 12 16 3 7 11 15 2 20: 3 8 13 1 6 11 16 4 9 14 2 7 12 0 5 10 15 3
    Thank you.

    You are most welcome, of course.

    - tye        

      I'm not convinced that either of those is required. But I'll refrain from trying to make a case stronger than that at least at this point.

      How would you distinguish between an empty slot and one that is in use other than by reserving one value from the input range (0 .. 2**64-1) as a sentinel or tell tale value?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        Sorry, I mistook the context of that statement due to lazy reading.

        Another approach would be a bitmap.

        You could even avoid a bitmap by, for example, sharding odd hash values into one hash table and even hash values into a second hash table. Then the 0 bit could indicate "empty" though the meaning of the value of that bit would be reversed between the two tables.

        - tye