neilwatson has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to understand multi-threaded processes but I'm sure I've missed something. The following script identifies fake email address (yes I know no address checker is a 100%) that some of our customers give when filling in webforms (there is a big fat opt-in/opt-out question so it's nothing sleezy.) The list of addresses is long so I wanted to use threading to speed up the proccess:

#!/usr/bin/perl -w #checks for valid email address #usage validemail <file containing email addresses> use warnings; use strict; use Email::Valid::Loose; use Net::DNS; use Parallel::ForkManager; use Fcntl qw/:flock :seek/; my $pm=new Parallel::ForkManager(20); my $resolver=Net::DNS::Resolver->new(); my $addrfile = $ARGV[0]; my ($is_valid, $host, $x, @mx, $add, @adds, $FH); #custom words that make emails invalid to you my @custom = qw( postmaster webmaster ); open (EMAILS, "$addrfile"); while (<EMAILS>){ $_ =~ s/\015//; chomp $_; push @adds, $_; } close (EMAILS); #warning, I will delete existing files $FH = "BADADDR"; open (FH, ">badmails") || die; $FH = "GOODADDR"; open (FH, ">goodmails") || die; foreach $add (@adds){ $pm->start and next; foreach $x (@custom){ if ($add =~ m/$x/){ $FH = "BADADDR"; writeaddr(); #address is bad $pm->finish; } } #if email is invalid move on if (!defined(Email::Valid::Loose->address($add))){ $FH = "BADADDR"; writeaddr(); #address is bad $pm->finish; } #if email is valid get domain name $is_valid = Email::Valid::Loose->address($add); if ($is_valid =~ m/\@(.*)$/) { $host = $1; } $is_valid=""; # perform dsn lookup to check domain @mx=mx($resolver, $host); if (@mx) { $FH = "GOODADDR"; writeaddr(); #address is good }else{ $FH = "BADADDR"; writeaddr(); #address is bad } $pm->finish; } $pm->wait_all_children; close (FH); close (FH); sub writeaddr{ flock FH, LOCK_EX or die "Flock failed: $!\n"; seek FH, SEEK_END, 0 or die "Seek failed: $!\n"; print FH "$add\n"; flock FH, LOCK_UN or die "unFlock failed: $!\n"; }

I receive no error messages but, nothing is written to the files. What have I missed?

Neil Watson
watson-wilson.ca

Replies are listed 'Best First'.
Re: using Parallel::ForkManager and FileHandles.
by ehdonhon (Curate) on Jun 11, 2002 at 01:33 UTC

    See Those fork()ing flock()ers... for some information on this very issue.

    Also, may I suggest the following change:

    my @custom = ( qr /postmaster/ , qr /webmaster/ ); ... foreach $x (@custom){ if ($add =~ $x ){ $FH = "BADADDR"; writeaddr(); #address is bad $pm->finish; } }

    By compiling your regular expressions only once before you fork, you may save some time. Especially if you have a very long list of e-mails.

Re: using Parallel::TaskManager and FileHandles.
by neilwatson (Priest) on Jun 12, 2002 at 15:49 UTC
    Finshed. Thanks to many bothers and sisters:

    #!/usr/bin/perl -w #checks for valid email address #usage validemail <file containing email addresses> use warnings; use strict; use Email::Valid::Loose; use Net::DNS; use Parallel::ForkManager; use Fcntl qw/:flock :seek/; # 20 is the number or threads. Increase at your own risk. # If your box takes a performance hit decrease this number. my $pm=new Parallel::ForkManager(20); my $resolver=Net::DNS::Resolver->new(); my $addrfile = $ARGV[0]; my ($is_valid, $host, $x, @mx, $add, @adds); #custom words that make emails invalid to you my @custom = ( qr /postmaster/i , qr /webmaster/i ); open (EMAILS, "$addrfile"); while (<EMAILS>){ $_ =~ s/\015//; chomp $_; push @adds, $_; } close (EMAILS); #warning, I will delete existing files open (BADADDR, ">badmails") || die; open (GOODADDR, ">goodmails") || die; foreach $add (@adds){ $pm->start and next; foreach $x (@custom){ if ($add =~ $x){ writeaddr(*BADADDR, $add); #address is bad $pm->finish; } } #if email is invalid move on if (!defined(Email::Valid::Loose->address($add))){ writeaddr(*BADADDR, $add); #address is bad $pm->finish; } #if email is valid get domain name $is_valid = Email::Valid::Loose->address($add); if ($is_valid =~ m/\@(.*)$/) { $host = $1; } $is_valid=""; # perform dsn lookup to check domain @mx=mx($resolver, $host); if (@mx) { writeaddr(*GOODADDR, $add); #address is good }else{ writeaddr(*BADADDR, $add); #address is bad } $pm->finish; } $pm->wait_all_children; close (BADADDR); close (GOODADDR); sub writeaddr{ my $FH = $_[0]; my $address = $_[1]; flock $FH, LOCK_EX or die "Flock failed: $!\n"; seek $FH, 0, 2 or die "Seek failed: $!\n"; print $FH "$address\n"; flock $FH, LOCK_UN or die "unFlock failed: $!\n"; }

    Comments welcome.

    Neil Watson
    watson-wilson.ca