comment on

The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.

Ah, I misunderstood what you meant by 'huge'. Still, if memory is your concern, that sounds like an even better reason to use a DB and let the DB handle the intersection calulations. BTW, what solution for intersection handling results in a 500MB memory footprint?! I'd like to know so I can avoid that myself.

For the purpose of blacklisting, it might be small-and-fast to convert your list of addresses into a hash instead. Assuming you've already populated @blacklist and @address, your intersection sub might look like:

my @BlackListed = intersect_of(\@blacklist, \@address);
sub intersect_of ($$) {
   my $a, $b = @_;
   my (%set_a, %set_b);
   
   ## put the larger set in %set_a
   if (@$a > @$b) {
      %set_a = map { $_ => undef } @$a;
      %set_b = map { $_ => undef } @$b;
   else {
      %set_a = map { $_ => undef } @$b;
      %set_b = map { $_ => undef } @$a;
   }
   
   my @intersect;

   ## iterate through smaller set
   for (keys %set_b) {
      push @intersect, $_ if exists $set_a{$_}
   }
   
   return @intersect;
}
[download]

This exact code is untested, but I have used code like it for whitelist/blacklist processing with address list files of about 5M each, and it performed quite acceptably. YMMV, of course.

Yoda would agree with Perl design: there is no try{}

In reply to Re^3: Finding an intersection of two sets, lean and mean by radiantmatrix
in thread Finding an intersection of two sets, lean and mean by Sinister

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.