We are going to start off by looking at somethng we have all done. The given example comes fron a great article on adding search functionality to Perl Applications
my %stopwords; @stopwords{(qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by))} = 1 x 27;
What's he doing here? He is building a hash so that later on he can check for stopwords like so:
my @salient_word = grep { not $stopwords{$_} } @word ;

Now, what is wrong with what he did? Well, nothing, but there is a module on CPAN, Set::Scalar that makes this tasks more DWIM instead of DWIS.

Here is how the same coding of stop words would be done using it:

use Set::Scalar; $stopwords = Set::Scalar->new; $stopwords->insert( qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by)); @word = split /\s+/, "Oh say can you see by the dawn's early light";
And then we check it like this:
my @salient_word = grep { not $stopwords->has($_) } @word ;
or like this
$word = Set::Scalar->new(@word); my $salient_word = $word - $stopwords; # or this: my $salient_word = $word->difference($stopwords);

Other Goodies

  • $s->invert(@members); : insert if it isnt in set, delete if it is. Or, codewise:
    $members = Set::Scalar->new(@members); my $not_in = $s - $members; my $in = $s->intersect($members); $s->delete($in->members); $s->insert($not_in->members);
  • union, intersection, difference : you know what these do.
  • symmetric_difference : the symmetric difference of sets $a and $b is performed like this:
    $N = ($a - $b); $N->insert($b - $a);
    In other words, it is ($a - $b) UNION ($b - $a)
  • unique : I don't understand this one.
  • complement : I don't understand this one.

Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality.

Replies are listed 'Best First'.
Re: Set::Scalar saves you from hash acrobatics
by Jenda (Abbot) on Oct 06, 2003 at 18:25 UTC
    use Benchmark; use Set::Scalar; my $stopwords = Set::Scalar->new; $stopwords->insert( qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by)); my %stopwords; @stopwords{qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by)} = (); @word = split /\s+/, "Oh say can you see by the dawn's early light"; sub withSetScalar { my @salient_word = grep { not $stopwords->has($_) } @word; } sub byHash { my @salient_word = grep { not exists $stopwords{$_} } @word; } timethese 100000, { withSetScalar => \&withSetScalar, byHash => \&byHash, }; __END__ Benchmark: timing 100000 iterations of byHash, withSetScalar... byHash: 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 94 +161.96/s (n=100000) withSetScalar: 9 wallclock secs ( 8.05 usr + 0.01 sys = 8.06 CPU) @ + 12403.87/s (n=100000)

    I don't think the "nicer" syntax is worth the price. (If the sentence/text was longer the difference would be even bigger.)

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

      I don't have a book on algorithms and data structers by hand (and googling did not render any usefull result) but I remember from my education that there are data structers for set operations with slightly better complexity characteristics than hash.

      By the way they should make a note in the documentation of this module on the algorithm they use.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Set::Scalar saves you from hash acrobatics
by Not_a_Number (Prior) on Oct 06, 2003 at 19:03 UTC

    Another word on the syntax: it's slightly wrong :-)

    my %hash; @hash{qw(foo bar baz)} = 1 x 3; print "$_ => $hash{$_}\n" for keys %hash;

    There should be parentheses around the '1':

    @hash{qw(foo bar baz)} = (1) x 3;

    dave

    Update: Oops, Ovid beat me to it by a couple of minutes!

Re: Set::Scalar saves you from hash acrobatics
by shotgunefx (Parson) on Oct 06, 2003 at 18:48 UTC
    One comment on the first syntax... I've always preferred to populate hashes like so in those situations.
    my %stopwords = map { $_ => 1 } qw(a i at be do to or is not no the that they then these them who where why can find on an of and it by );


    -Lee

    "To be civilized is to deny one's nature."

      Not to mention the fact that the original code has a bug. There needs to be parentheses around the "1" to force list context.

      #!/usr/bin/perl use strict; use Data::Dumper; my (%hash1, %hash2); my @keys = qw/foo bar baz/; @hash1{@keys} = 1 x 3; @hash2{@keys} = (1) x 3; print Dumper \%hash1, \%hash2;

      That generates:

      $VAR1 = {
                'foo' => '111',
                'baz' => undef,
                'bar' => undef
              };
      $VAR2 = {
                'foo' => 1,
                'baz' => 1,
                'bar' => 1
              };

      Cheers,
      Ovid

      New address of my CGI Course.

Re: Set::Scalar saves you from hash acrobatics
by kabel (Chaplain) on Oct 07, 2003 at 06:39 UTC

    somebody to remember this article about overloading at perl.com?

    i patched Set::Scalar a little bit some time ago to support '{1 2 3}' and '{1 .. 3}' notions. code is broken, but should suffice for some own experiments ... just put it somewhere at the top of it, and off you go. beware to throw code far, far away afterwards ;-) (where no code has gone before)

    kabel@linux:~> perl -w -MSet::Scalar my $some_ints = '{1 .. 4}'; my $other_ints = '{3 .. 8}'; my $all_ints = $some_ints + $other_ints; print '[', join( ', ', sort $all_ints->members ), "]\n"; print $Set::Scalar::VERSION, " ", $], $/; [1, 2, 3, 4, 5, 6, 7, 8] 1.17 5.008 kabel@linux:~>