The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.
Ah, I misunderstood what you meant by 'huge'. Still, if memory is your concern, that sounds like an even better reason to use a DB and let the DB handle the intersection calulations. BTW, what solution for intersection handling results in a 500MB memory footprint?! I'd like to know so I can avoid that myself.
For the purpose of blacklisting, it might be small-and-fast to convert your list of addresses into a hash instead. Assuming you've already populated @blacklist and @address, your intersection sub might look like:
This exact code is untested, but I have used code like it for whitelist/blacklist processing with address list files of about 5M each, and it performed quite acceptably. YMMV, of course.my @BlackListed = intersect_of(\@blacklist, \@address); sub intersect_of ($$) { my $a, $b = @_; my (%set_a, %set_b); ## put the larger set in %set_a if (@$a > @$b) { %set_a = map { $_ => undef } @$a; %set_b = map { $_ => undef } @$b; else { %set_a = map { $_ => undef } @$b; %set_b = map { $_ => undef } @$a; } my @intersect; ## iterate through smaller set for (keys %set_b) { push @intersect, $_ if exists $set_a{$_} } return @intersect; }
Yoda would agree with Perl design: there is no try{}
In reply to Re^3: Finding an intersection of two sets, lean and mean
by radiantmatrix
in thread Finding an intersection of two sets, lean and mean
by Sinister
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |