Just had another thought. While dumping the file (probably in paced chunks - waiting for the queue to shrink back towards zero) may be sensible you need to permute you infile somewhow to ensure you don't have:
bob@domain
sue@domain
...
foo@domain
bar@other_domain
If you dump a whole series of emails to the same mail server in a row it will choke and possibly ban/throttle you. One simple approach would be simply to apply a sort and let the variation in username vaguely randomise the domains or you could shuffle them in an array using a Fisher Yeats.
Provided you don't have high frequencies of gmail, hotmail, yahoo accounts a simple sort ought to work OK, otherwise you may need some clever code to make sure that these common domains don't occur in a row.
I would probably take the easy road and try a simple sort first and check how many times a given domain occurs in your proposed concurrency frame (probably 50-100). Domains occuring more than 2-3 times within a frame may be a problem as your MTA will be asking for that many concurrent connections.
Update
Could not resist. Here is a don't hit the same domain if we have sent an email in the last n width frame algorithm to run you address list through. NB Code updated to remove bug where domain pulled off fifo in else unchecked against current working domain - if it is that needs to go on the fifo, if not it is good to go (untested)
#!/usr/bin/perl
use strict;
my $frame = 10;
my @fifo = ();
my @seen = ('') x $frame;
while (my $email = <DATA>) {
chomp $email;
next if $email =~ m/^\s*$/;
my $domain = get_domain($email);
if (seen($domain, \@seen)) {
push @fifo, [$domain, $email];
}
else {
# try to pull next problem email off fifo buffer
if (@fifo and not seen($fifo[0]->[0], \@seen)){
cout($fifo[0]->[1]);
shift @seen;
push @seen, $fifo[0]->[0];
my $fifo = shift @fifo;
# make sure current email is not same domain
# that we just pulled off fifo....
if ( $fifo->[0] eq $domain ) {
push @fifo, [$domain, $email];
next;
}
}
cout($email);
shift @seen;
push @seen, $domain;
}
}
# we have failed if our fifo is not empty, solution decrease frame wid
+th
if (@fifo) {
cout($_->[1]) for @fifo;
die sprintf("Still had fifo buffer of length %d with frame width %
+d\n",
scalar @fifo, $frame);
}
sub cout {print "$_[0]\n" }
sub get_domain {
my ($email) = @_;
(my $domain) = $email =~ m/\@([\w\-\.]+)/;
die "Can't find domain for $email at line $.\n" unless $domain;
return $domain;
}
sub seen {
my ($domain, $seen) = @_;
for (@$seen) {
return 1 if $_ eq $domain;
}
return 0;
}
__DATA__
b@b.com
b@b.com
b@b.com
b@b.com
b@b.com
b@b.com
b@b.com
b@b.com
b@b.com
b@b.com
a@aa.com
a@ab.com
a@ac.com
a@ad.com
a@ae.com
a@af.com
a@ag.com
a@ah.com
a@ai.com
a@aj.com
a@ak.com
a@al.com
a@am.com
a@an.com
a@ao.com
a@ap.com
a@aq.com
a@ar.com
a@as.com
a@at.com
a@au.com
a@av.com
a@aw.com
a@ax.com
a@ay.com
a@az.com
a@ba.com
a@bb.com
a@bc.com
a@bd.com
a@be.com
a@bf.com
a@bg.com
a@bh.com
a@bi.com
a@bj.com
a@bk.com
a@bl.com
a@bm.com
a@bn.com
a@bo.com
a@bp.com
a@bq.com
a@br.com
a@bs.com
a@bt.com
a@bu.com
a@bv.com
a@bw.com
a@bx.com
a@by.com
a@bz.com
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.