Binary heap

A few months ago I wrote a binary heap implementation. This meditation explains why I wrote a new one instead of using an existing solution, how I wrote it, and gives the code. I didn't make a complete reusable module if this, but I figure this might still be usefulf for someone.

The immediate reason I post this is that peacekorea asked a question where I mentioned in my reply that a using a heap might be useful. That made me think about what heap implementation I'd recommend.

Fortunately, I also had to think about this a few months ago, when I ended up writing this new implementation. I won't explain the exact situation why I needed a heap now (sorry), but it should be enough that

I had to be able to read and decrease the key of elements of the hash. This is only possible if you store the position of every element of the heap in a separate table (indexed by a secondary key), and update this table every time I move an element in the hash.

Let's see now what existing hash implementations I considered. I first turned to the heap script I wrote some time ago. This is rather incomplete (or specialized?), it lacks certain operations you can do on the heap. Thus, I couldn't reuse much of it, but I still looked at it a few times as it was familiar to me as I wrote it.

Then, I searched for "heap" on CPAN. I've found two modules: Heap, and Heap::Simple.

Heap::Simple has a very confusing documentation, as the SYNOPSIS section contains too much code. Thus, it took me some time to figure out how the module worked, but finally I've learned that it doesn't support the indexing the elements by a secondary key I've described above. Just let me note that the hash functions of the C++ STL can't do this either. This can be useful sometimes, as it can make the operations faster, but it made the module unsuitable for me.

Heap does support this kind of indexing, and also decreasing the key of entries. However, I've found it too complicated to use, or rather, too large. It's just larger then what I needed, and too much object-oriented. (Update 2006 apr 19: demerphq shares my opinion in Algorithm::Treap.)

So, I decided to write a new implementation from scratch. Heap still helped me a lot: its source is very readable so I could write most functions by translating its functions.

Without further ado, here's the code:

{
# DEBUG a stats variable
my $_hheap_maxsize = -1;

# the hashed-heap.
# elements of the binary heap are three-elt arrays of:
#   d: the closest distance (the key of the heap),
#   q: the point (the key of the hash),
#   p: the other point it is close from.
# the hash has the points as its keys,
#   and the indices in the heap as their value.

# hheap_init is a private function that creates a hash-heap
sub hheap_init {
 [{}, []];
}

# hheap_dec is a private function that inserts or moves down an elemen
+t in the heap.
# if the new distance is greater than the existing one, nothing happen
+s.
sub hheap_dec {
 my($hheap, $d, $q, $p) = @_;
 my($hash, $heap) = @$hheap;
 #warn "[D120 hheap_dec $d, $q, $p]";
 my($k, $kk);
 if (!exists($$hash{$q})) { # new element, insert it
  $k = @$heap;
 } else {
  $k = $$hash{$q};
  ${$$heap[$k]}[0] <= $d and # no improvement, do nothing
   return;
  # improvement, change it
 }
 # gravitate it down
 while (0 < $k && $d < ${$$heap[$kk = ($k - 1) >> 1]}[0]) {
  $$heap[$k] = $$heap[$kk];
  $$hash{${$$heap[$k]}[1]} = $k;
  $k = $kk;
 }
 # do the actual change
 $$heap[$k] = [$d, $q, $p];
 $$hash{$q} = $k;
 #warn "[D130]";
}

# hheap_pop is a private function for popping the element 
# of the smallest distance (minimal d) from the heap.
# returns the empty list if the heap is empty.
sub hheap_pop {
 #warn "[D220 hheap_pop]";
 my($hheap) = @_;
 my($hash, $heap) = @$hheap;
 my(@r, $m, $k, $j, $t, $d);
 @$heap or
  return;
 $k = 0;
 @r = @{$$heap[$k]};
 delete($$hash{$r[1]});
 $t = pop @$heap;
 $d = $$t[0];
 $m = @$heap or
  return @r;
 {
  $j = ($k << 1) + 1;
  $j < $m or
   last;
  $j + 1 < $m &&
   ${$$heap[$j + 1]}[0] <= ${$$heap[$j]}[0] and
   $j = $j + 1;
  ${$$heap[$j]}[0] < $d and do {
   $$heap[$k] = $$heap[$j];
   $$hash{${$$heap[$k]}[1]} = $k;
   $k = $j;
   redo;
  };
 }
 $$heap[$k] = $t;
 $$hash{$$t[1]} = $k;
 #warn "[D230 @r, $k @$t]";
 @r;
}

# a private debugging function
sub hheap_verify {
 if (0) { # DEBUG
  my($e, $s);
  eval {
   $s = hheap_verify1(@_);
  };
  $e = $@ and do {
   require Dumpvalue;
   $|++;
   Dumpvalue->new->dumpValues([$e, @_, $_hheap_maxsize]);
   die $e;
  };
  $_hheap_maxsize < $s and
   $_hheap_maxsize = $s;
 }
}
sub hheap_verify1 {
 #warn "[D320 hheap_verify]";
 my($hheap) = @_;
 my($hash, $heap) = @$hheap;
 my($k);
 keys(%$hash) == @$heap or
  die "assertion failed: hheap size mismatch: hash " . keys(%$hash) . 
+", heap " . @$heap;
 for $k (0 .. @$heap - 1) {
  ref($$heap[$k]) eq "ARRAY" or
   die "assertion failed: non-array heap element index " . $k;
  exists($$hash{${$$heap[$k]}[1]}) or
   warn("assertion failed: hheap element missing from hash: "), die "i
+ndex " . $k . ", element " . ${$$heap[$k]}[1] . ", distance " . ${$$h
+eap[$k]}[0];
  $k == $$hash{${$$heap[$k]}[1]} or
   warn("assertion failed: hheap index mismatch: "), die "    real ind
+ex " . $k . ", element " . ${$$heap[$k]}[1] . ", hash value " . $$has
+h{${$$heap[$k]}[1]};
 }
 for $k (1 .. @$heap - 1) {
  ${$$heap[($k - 1) >> 1]}[0] <= ${$$heap[$k]}[0] or
   warn("assertion failed: hheap heap mismatch: "), die "parent index 
+" . $k . ", element " . ${$$heap[$k]}[1] . ", distance " . ${$$heap[$
+k]}[0] .
    ", child index " . (($k - 1) >> 1) . ", element " . ${$$heap[($k -
+ 1) >> 1]}[1] . ", distance " . ${$$heap[($k - 1) >> 1]}[0];
 }
 #warn "[D330]";
 0 + @$heap;
}

}
[download]

So, the elements of the heap have three parts: ($d, $p, $q).

$d is the numeric primary key, so hheap_pop shall always pop the element with the smallest key.

$p is the secondary key (a string), which must be unique to all elements. If you use the same secondary key twice, then hheap_dec won't insert a new element, instead it will search the element of the heap with the same key, read its primary key. If the primary key you've given is smaller then the one the element already has, then that of the element will be decreased. Otherwise, the element will be unaffected. (If you don't like this, I'm sorry. This is the operation I needed.)

Finally, $q contains extra information not used by the heap functions but returned when you get back the element.

So, the functions are these. $h = hheap_init() creates a new heap, which is to be treated as an opaque data structure. The other functions all operate on this structure. hheap_dec($h, $p, $q, $d) inserts a new element or possibly changes an existing one, as defined above. ($d, $p, $q) = hheap_pop() removes and returns the element from the heap with the smallest $d.

Later, I've converted the program to C++. There, the equivalent of $p was a small integer, so I could use an array instead of a hash table as lookup table.

Update 2008 jul 3: I also have a variant of this core ported to ruby.

Comment on Binary heap Select or Download Code

Replies are listed 'Best First'.
Re: Binary heap by perrin (Chancellor) on Dec 07, 2005 at 21:50 UTC
See also Heap::Fibonacci (part of the "Heap" distribution) and Heap::Simple::XS.	[reply]
Re^2: Binary heap by ambrus (Abbot) on Dec 07, 2005 at 21:59 UTC
I've looked. Heap::Fibonacci is one of the three engines of the Heap module. Heap::Simple::XS in an engine for Heap::Simple, although it's distributed separately (I guess that's to keep Heap::Simple a pure-perl module). The other engine is Heap::Simple::Perl. Both of Heap and Heap::Simple are written in such a way that they can have more than one backend. If you really want to know, the other heap modules on CPAN are not these but Array::Heap2 and Heap::Priority. I've glanced at those too at that time, but I don't remember my conclusion about them.	[reply]
Re: Binary heap by ambrus (Abbot) on Aug 24, 2015 at 21:00 UTC
See also Re: Size-limited, fitness-based lists.	[reply]