http://qs1969.pair.com?node_id=471246

I have taken this in to serious consideration, however the impact of this does not justify it.

The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.

I need to preserve RAM space (eg: pref. no more then the size of the file) and be fast with my blacklisting, since operators are getting more and more demanding.

Dragonchild's solution was my first attempt as well, it consumes less memory but it is slow(er) so I came up with the above piece of code... TIMTOWTDI, remember?
• Comment on Re^2: Finding an intersection of two sets, lean and mean

Replies are listed 'Best First'.
Re^3: Finding an intersection of two sets, lean and mean
by radiantmatrix (Parson) on Jun 30, 2005 at 16:26 UTC

The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.

Ah, I misunderstood what you meant by 'huge'. Still, if memory is your concern, that sounds like an even better reason to use a DB and let the DB handle the intersection calulations. BTW, what solution for intersection handling results in a 500MB memory footprint?! I'd like to know so I can avoid that myself.

```my @BlackListed = intersect_of(\@blacklist, \@address);
sub intersect_of (\$\$) {
my \$a, \$b = @_;
my (%set_a, %set_b);

## put the larger set in %set_a
if (@\$a > @\$b) {
%set_a = map { \$_ => undef } @\$a;
%set_b = map { \$_ => undef } @\$b;
else {
%set_a = map { \$_ => undef } @\$b;
%set_b = map { \$_ => undef } @\$a;
}

my @intersect;

## iterate through smaller set
for (keys %set_b) {
push @intersect, \$_ if exists \$set_a{\$_}
}

return @intersect;
}