in reply to List processing performance

Well, in general you're doing pretty well, I think. You're using hashes for uniqueness lookups, which is always a good first step. :)

One thing that came to my mind is why don't you combine the two steps? So you'd end up w/ something like this:

my($self) = @_; my(%seen, %r); my @common = qw(a and at); @seen{@common} = (); for my $r (@words) { next if exists $seen{ lc $r }; $r{lc $r} = 1; } my @words = sort keys %r;

Replies are listed 'Best First'.
Re: List processing performance
by autark (Friar) on Jul 11, 2000 at 20:30 UTC
    playing golf,
    my @words = ( ... ); my @common = qw|a and at|; my %seen; @seen{@common} = (1) x @common; @words = grep { $_ = lc; ! $seen{$_}++ } sort @words;
    :-) Autark
      Oh, is *that* what we're doing then. :) In which case replace your last line with this:
      @words = sort grep !$seen{$_ = lc}++, @words
      A couple characters shorter, plus it's faster because the sort is done after you've filtered out duplicates and common words.
        Golf, eh? @words = sort grep !$seen{+lc}++, @words; The + keeps lc from being interpreted as a hash key, and lc operates on $_ by default.

        Update: Of course, this works if your data set is all lowercase. Moral of the story, don't go for the birdie unless you're sure it's what you want.

RE: Re: List processing performance
by Odud (Pilgrim) on Jul 11, 2000 at 20:25 UTC
    A good point about combining the two parts - the real code is actually two subroutines but only because I developed it in stages, there's no compelling reason for it. Your method also eliminates the temporary list which was bugging me. Thanks muchly.