This will find the intersection of two sets. That, by it self, is nothing new... my problem however is space vs. time. I don't want the trade-off, I want it to be fast and small whereas the CPAN modules providing the means I want are always small && slow OR huge && fast I use this to apply blacklists to address-files. The below snippet is a fully functional program. Play around with $keyLength to see increase in performance (or not)
open(local *BLACKLIST, "<blacklist"); open(local *ADDRESS, "<address"); @blacklist = <BLACKLIST>; @address = <ADDRESS>; my $sep = "|"; my $keyLength = 6; my @blackListed = intersection(\@blacklist, \@address); print "Found: " . scalar(@blackListed) . "\n"; sub intersection { my ( $list1, $list2 ) = @_; my %strings; my $loop; # turn the biggest list into a strings hash. # AND loop through the smallest list. if ( $#{$list1} > $#{list2} ) { %strings = makeStringsHash( \$list1 ); $loop = \$list2; } else { %strings = makeStringsHash( \$list2 ); $loop = \$list1; } # run through the smallest of lists my @intersection = (); # for each key remember the last position ( the strings in the hash # are sorted, remember? ) my %lastPos = (); foreach my $entry ( @{ $$loop } ) { my $key = substr($entry, 0, $keyLength); my $pos = $lastPos{$key} || 0; my $tmp = index( $strings{$key}, $sep.$entry.$sep, $pos ); # if we found it in the big-list, add it to the intersection if ( $tmp != -1 ) { push @intersection, $entry; $lastPos{$key} = $tmp; } } return @intersection } sub makeStringsHash { my ( $list ) = @_; my %strings = (); $strings{substr($_, 0, $keyLength)} .= $sep . $_ . $sep foreach ( so +rt @$$list ); return %strings; }

In reply to Finding an intersection of two sets, lean and mean by Sinister

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.