Set::Scalar saves you from hash acrobatics

We are going to start off by looking at somethng we have all done. The given example comes fron a great article on adding search functionality to Perl Applications


my %stopwords;
@stopwords{(qw(a i at be do to or is not no the 
               that they then these them who where 
               why can find on an of and it by))} = 1 x 27;
[download]

What's he doing here? He is building a hash so that later on he can check for stopwords like so:

my @salient_word = grep { not $stopwords{$_} } @word ;
[download]

Now, what is wrong with what he did? Well, nothing, but there is a module on CPAN, Set::Scalar that makes this tasks more DWIM instead of DWIS.

Here is how the same coding of stop words would be done using it:

use Set::Scalar;
$stopwords = Set::Scalar->new;
$stopwords->insert( qw(a i at be do to or is not no the 
               that they then these them who where 
               why can find on an of and it by));
@word = split /\s+/, "Oh say can you see by the dawn's early light";
[download]

And then we check it like this:

my @salient_word = grep { not $stopwords->has($_) } @word ;
[download]

or like this

$word = Set::Scalar->new(@word);
my $salient_word = $word - $stopwords;
# or this:
my $salient_word = $word->difference($stopwords);
[download]

Other Goodies

$s->invert(@members); : insert if it isnt in set, delete if it is. Or, codewise:

$members = Set::Scalar->new(@members);

my $not_in = $s - $members;
my $in     = $s->intersect($members);

$s->delete($in->members);
$s->insert($not_in->members);
[download]

union, intersection, difference : you know what these do.
symmetric_difference : the symmetric difference of sets $a and $b is performed like this:
```
 $N = ($a - $b); 
 $N->insert($b - $a);
[download]
```
In other words, it is ($a - $b) UNION ($b - $a)
unique : I don't understand this one.
complement : I don't understand this one.

Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality.

Comment on Set::Scalar saves you from hash acrobatics Select or Download Code

Replies are listed 'Best First'.
Re: Set::Scalar saves you from hash acrobatics by Jenda (Abbot) on Oct 06, 2003 at 18:25 UTC
use Benchmark; use Set::Scalar; my $stopwords = Set::Scalar->new; $stopwords->insert( qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by)); my %stopwords; @stopwords{qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by)} = (); @word = split /\s+/, "Oh say can you see by the dawn's early light"; sub withSetScalar { my @salient_word = grep { not $stopwords->has($_) } @word; } sub byHash { my @salient_word = grep { not exists $stopwords{$_} } @word; } timethese 100000, { withSetScalar => \&withSetScalar, byHash => \&byHash, }; __END__ Benchmark: timing 100000 iterations of byHash, withSetScalar... byHash: 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 94 +161.96/s (n=100000) withSetScalar: 9 wallclock secs ( 8.05 usr + 0.01 sys = 8.06 CPU) @ + 12403.87/s (n=100000) [download] I don't think the "nicer" syntax is worth the price. (If the sentence/text was longer the difference would be even bigger.) Jenda Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. -- Rick Osborne Edit by castaway: Closed small tag in signature	[reply] [d/l]
Re: Re: Set::Scalar saves you from hash acrobatics by zby (Vicar) on Oct 07, 2003 at 09:22 UTC
I don't have a book on algorithms and data structers by hand (and googling did not render any usefull result) but I remember from my education that there are data structers for set operations with slightly better complexity characteristics than hash. By the way they should make a note in the documentation of this module on the algorithm they use.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Set::Scalar saves you from hash acrobatics by Not_a_Number (Prior) on Oct 06, 2003 at 19:03 UTC
Another word on the syntax: it's slightly wrong :-) `my %hash; @hash{qw(foo bar baz)} = 1 x 3; print "$_ => $hash{$_}\n" for keys %hash;` [download] There should be parentheses around the '`1`': `@hash{qw(foo bar baz)} = (1) x 3;` dave Update: Oops, Ovid beat me to it by a couple of minutes!	[reply] [d/l] [select]
Re: Set::Scalar saves you from hash acrobatics by shotgunefx (Parson) on Oct 06, 2003 at 18:48 UTC
One comment on the first syntax... I've always preferred to populate hashes like so in those situations. `my %stopwords = map { $_ => 1 } qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by );` [download] -Lee "To be civilized is to deny one's nature."	[reply] [d/l]
Re: Re: Set::Scalar saves you from hash acrobatics by Ovid (Cardinal) on Oct 06, 2003 at 19:00 UTC
Not to mention the fact that the original code has a bug. There needs to be parentheses around the "1" to force list context. `#!/usr/bin/perl use strict; use Data::Dumper; my (%hash1, %hash2); my @keys = qw/foo bar baz/; @hash1{@keys} = 1 x 3; @hash2{@keys} = (1) x 3; print Dumper \%hash1, \%hash2;` [download] That generates: `$VAR1 = { 'foo' => '111', 'baz' => undef, 'bar' => undef }; $VAR2 = { 'foo' => 1, 'baz' => 1, 'bar' => 1 };` Cheers, Ovid New address of my CGI Course.	[reply] [d/l]
Re: Set::Scalar saves you from hash acrobatics by kabel (Chaplain) on Oct 07, 2003 at 06:39 UTC
somebody to remember this article about overloading at perl.com? i patched Set::Scalar a little bit some time ago to support '{1 2 3}' and '{1 .. 3}' notions. code is broken, but should suffice for some own experiments ... just put it somewhere at the top of it, and off you go. beware to throw code far, far away afterwards ;-) (where no code has gone before) Read more... (779 Bytes) `kabel@linux:~> perl -w -MSet::Scalar my $some_ints = '{1 .. 4}'; my $other_ints = '{3 .. 8}'; my $all_ints = $some_ints + $other_ints; print '[', join( ', ', sort $all_ints->members ), "]\n"; print $Set::Scalar::VERSION, " ", $], $/; [1, 2, 3, 4, 5, 6, 7, 8] 1.17 5.008 kabel@linux:~>` [download]	[reply] [d/l] [select]