We have a set of elements. Should we shove it in an array or
a hash? If the set is unordered, it is clear that the choice
is to use a hash:
$hash{$_}++ for qw( one_element another still_another foo foo );
If we have duplicate elements we get almost for free as associated
values the number of duplicates for a given element. Note that we don't have lost the
ability to iterate over the set. In the following iteration, duplicates are made
consecutive.
print "$_\n" for map { my $i=$_; map { $i } 1.. $hash{$_} } keys %hash
If the set is ordered, it may be a stack or a queue; or a static list which elements are accessed by rank. In Perl all of those are stored in arrays. We don't have to remember fancy classes names here unlike the OO bondage languages.
Now the interesting case... and less intuitive. The set has no duplicates and is ordered by some (possibly costly) function of the elements values and we are adding and supressing elements all the time. Because it is ordered, we are tempted to use an array; but insertion and deletion in a an array are complex and costly (splices), We may think of a linked list, but really the natural choice is usually the hash (or 2 hashes). Let's call val the ordering function.
We start by the case of one hash
Note that if we don't have an explicit ordering function, the order
is just the the elements appearance order:
my $i=0; $hash{$_}=$i++ for qw( one_element another still_another foo );
Adding an element: $hash{$elt}= val $elt
Suppressing an element: delete $hash{$elt}
Iterating with order: print "$_\n" for sort { $hash{$a} <=> $hash{$b} } keys %hash;
Because we have a one to one correspondance between keys and
values, we could use a an additional hash: a "reverse" hash
%rev. It has the property: $hash{$rev{$_}} == $_ == $rev{$hash{$_}}
Iterating would be cheap:
print "$rev{$_}\n" for sort keys %rev;
But the price to pay is to maintain two hashes instead of one when
adding or deleting elements. So depending if we spend more time
iterating over element or adding and deleting elements, we should use
respectively two or one hash. Space consideration may also a concern
if using two hashes.
In both cases, each iteration incurs the cost of a sort: n log2 n
So if adding and deleting an element is really the rare case compared to iterating, than you are better using an array and the joice of splicing :(
Sure enough, when we want to more operations than adding, deleting and iterating, richer data structures must be considered.
This reflexion was triggered by On Removing Elements from Arrays
Corrected to suppress the unneeded Schwartzian Transform as suggested below by merlyn below. The ordering function is already cached as value associated to key. The whole point of an ST is to avoid to recalculate that function over and over. The correction does not change the validity of the points made in that node. Thanks merlyn
-- stefp
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Using a hash for an ordered set can make sense
by merlyn (Sage) on Sep 23, 2001 at 09:03 UTC | |
Re (tilly) 1: Using a hash for an ordered set can make sense
by tilly (Archbishop) on Sep 23, 2001 at 22:33 UTC | |
Re: Using a hash for an ordered set can make sense
by dragonchild (Archbishop) on Sep 24, 2001 at 17:14 UTC |