Hash Set

tomazos has asked for the wisdom of the Perl Monks concerning the following question:

I want a set.

I think the best way is to use a hash to implement a set by using the existant keys to represent the current members of the set.

My current strategy is along the lines of:

my %set;
sub contains($) { return exists $set{$_[0]} }
sub add($) { $set{$_[0]} = 1 }
sub all($) { return [keys %set] }
[download]

(I'm not actually using subs -- just for clarity.)

Is this approach the simplest way?

Is there a better approach avoiding the assignment for the add? Something like create $set{'foo'}?

I think I don't need the hash entries to be defined. I just need them to exist. (If you iterate over keys it will still return entries that have existant but undefined values right?)

-Andrew.

Andrew Tomazos | andrew@tomazos.com | www.tomazos.com

Comment on Hash Set Select or Download Code

Replies are listed 'Best First'.
Re: Hash Set by ambrus (Abbot) on Sep 11, 2005 at 07:15 UTC
I think the best way is to use a hash to implement a set by using the existant keys to represent the current members of the set. ... Is this approach the simplest way? Probably yes, but really it depends on what operations you want to do on the set. If you, for example, never want to check if certain elements are included, you only want to iterate over the set, then it's better to use an array (or a list). This case is very common, just most people don't call that a set, but a list. C++ STL doesn't call that a set either. Also, if the set consists of small integers, and you don't want to iterate over them, then it may be better to use an array like `my @set; sub contains { @set{$_[0]}; } sub add { @set{$_[0]} = 1; }` [download] Update: parv is right, I don't know what I was thinking when I wrote the above code. The right code is `my @set; sub contains { $set[$_[0]]; } sub add { $set[$_[0]] = 1; }` [download] Or you can use a bit vector like `my $set = ""; sub contains { vec($set, $_[0], 1); } sub add { vec($set, $_[0], 1) = 1; }` [download] Is there a better approach avoiding the assignment for the add? Something like create `$set{'foo'}`? Not really. Some people (me too sometimes) use `$set{"foo"}++;` but that's not really better at all, only shorter. I think I don't need the hash entries to be defined. I just need them to exist. (If you iterate over keys it will still return entries that have existant but undefined values right?) That's right. If you do not insist on values to be defined, just exist, the code you gave should still work. Moreover, there's a little advantage in that approach: you can add multiple elements to the set with a shorter syntax: `$set{@elements} = ();` [download] I however sometimes still represent a set in such a way that all values are `1`, because if I have to test for inclusion in the set many times in the code, I can just write `$set{$elt}` instead of `exists($set{$elt})` that way.	[reply] [d/l] [select]
Re^2: Hash Set by parv (Parson) on Sep 11, 2005 at 07:49 UTC
> Also, if the set consists of small integers, and you > don't want to iterate over them, then it may be better > to use an array like > > my @set; > sub contains { @set{$_[0]}; } > sub add { @set{$_[0]} = 1; } It seems -- from the operations in the `sub`s -- you have declared a hash called `%set` somewhere not shown. That, or you are confused about the hash and array operations, in which case correct syntax for working w/ the array would be ... `sub contains { $set[ $_[0] ]; } sub add { $set[ $_[0] ] = 1; }` [download] - parv	[reply] [d/l] [select]
Re^2: Hash Set by tomazos (Deacon) on Sep 11, 2005 at 07:56 UTC
A list can contain multiple identical elements. A set cannot. I want it to be able to contain any scalar (not just integers). Specifically the operations I want to do are as per the subs I gave in my example: add an element (no-op if already a member) test if an element is contained iterate over all the elements -Andrew. Andrew Tomazos \| andrew@tomazos.com \| www.tomazos.com	[reply]
Re: Hash Set by parv (Parson) on Sep 11, 2005 at 08:05 UTC
Yes, you are correct in that values do not need to be defined. So in your `add()`, you could just assign `undef`. BTW, have you looked at set modules available via CPAN? They may come handy (no personal experience) if you would need yet unstated operations at a later date. If nothing, they may give you ideas about other implementaions.	[reply] [d/l] [select]


Don't ask to ask, just ask
	PerlMonks