Many of you are experts in the Perl performance area and I would appreciate your comments on the following code snippets - they do the job but I have no idea if they are particularly efficient. What I am trying to do is to produce a sorted list of words from which duplicates and common words (a at the if ... etc) have been removed.

The size of list is typically about 300 items but could, in theory, be much larger.

The first thing I do is produce a sorted unique list using some code that I saw somewhere (probably the Perl Cookbook)
# @words is the list of words (composed of letters and numbers only, n +o punctuation) my($self) = @_; foreach $r (@words) { $r{lc $r} = 1; } @words = sort keys %r;
Then the next thing I do is to look if any of the words are also in the common list, if so they don't get carried forward. I'm suspicious of this code because @only is a temporary variable which often points to an improvement being possible
# @seen is (I think) a pseudo-hash of common words @common = qw(a and at); # bigger in real life @seen{@common} = (); foreach $item (@words) { push(@only,$item) unless exists $seen{$item}; } @words = @only;
I'd appreciate your comments and suggestions very much as this is an area in which I'm happy with the basics but haven't much knowledge of the finer points.

In reply to List processing performance by Odud

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.