in reply to Re^2: Finding an intersection of two sets, lean and mean

in thread Finding an intersection of two sets, lean and mean

*The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.*

Ah, I misunderstood what you meant by 'huge'. Still, if memory is your concern, that sounds like an even better reason to use a DB and let the DB handle the intersection calulations. BTW, what solution for intersection handling results in a 500MB memory footprint?! I'd like to know so I can avoid that myself.

For the purpose of blacklisting, it might be small-and-fast to convert your list of addresses into a hash instead. Assuming you've already populated `@blacklist` and `@address`, your intersection sub might look like:

This exact code is untested, but I have used code like it for whitelist/blacklist processing with address list files of about 5M each, and it performed quite acceptably. YMMV, of course.my @BlackListed = intersect_of(\@blacklist, \@address); sub intersect_of ($$) { my $a, $b = @_; my (%set_a, %set_b); ## put the larger set in %set_a if (@$a > @$b) { %set_a = map { $_ => undef } @$a; %set_b = map { $_ => undef } @$b; else { %set_a = map { $_ => undef } @$b; %set_b = map { $_ => undef } @$a; } my @intersect; ## iterate through smaller set for (keys %set_b) { push @intersect, $_ if exists $set_a{$_} } return @intersect; }

Yoda would agree with Perl design: there is no `try{}`

Comment onRe^3: Finding an intersection of two sets, lean and meanDownloadCode