neilwatson has asked for the wisdom of the Perl Monks concerning the following question:

This program verifies a list of email addresses via a web interface. It works except for one problem (so far). The part where it prints to the browser that it will email the results, gets duplicated for every email address on the list. Since the forkmanager doesn't start until after that, I'm at a loss as to why this is happening. I'm sure it has to do with my inability to fully grasp OO programming :(

Can some kind soul offer me some advice?

#!/usr/bin/perl -w #checks for valid email address use warnings; use strict; use Email::Valid::Loose; use Net::DNS; use Parallel::ForkManager; use Fcntl qw/:flock :seek/; use CGI qw/:standard/; use Proc::ProcessTable; use MIME::Lite; #limit file to 500k in size $CGI::POST_MAX = 1024 * 500; my $cgi = new CGI or die "new CGI: $!\n"; my $addrfile = $cgi->param('emailist'); my $email = $cgi->param('email'); my $t = new Proc::ProcessTable; my $pm = new Parallel::ForkManager(10); my $resolver=Net::DNS::Resolver->new(); my ($is_valid, $host, @mx, $add, @adds, $num, $p, $msg, $str); my $x = 0; #check to see if this process is already running foreach $p ( @{$t->table} ) { if ($p->cmndline =~ m/^validemail/i){ $x++ } if ($x > 1){ #tell user to try again later and quit. print header, start_html; print "\t<meta http-equiv=\"refresh\" content=\"5; url=http:// +marketing.mydomain.com\">\n"; print h1('Sorry,'), h2('The email validator is already in use by another use +r.'), h2('Please try again later...'), end_html; die "validemail already running"; } } #THIS IS THE PART THAT DUPLICATES :( #begin checking email print header, start_html; print "\t<meta http-equiv=\"refresh\" content=\"5; url=http://marketin +g.mydomain.com\">\n"; print h1('Validating your email list.'), h2('I will email you results when I am finished.'), h2('This could take up to 1 hour.'), end_html; #custom words that make emails invalid to you my @custom = ( qr /mydomain/i , qr /postmaster/i , qr /webmaster/i ); #remove troublesome windows /r characters while (<$addrfile>){ $_ =~ s/\015//; chomp $_; push @adds, $_; } #clear file and memory $addrfile = 0; #check that file does not already exist. #use random number to create temp files. srand; $num = int (rand 10000)+1; while (-e "badmails$num.txt" && -e "goodmails$num.txt"){ $num = int (rand 10000)+1; } open (BADADDR, ">/tmp/badmails$num.txt") || die; open (GOODADDR, ">/tmp/goodmails$num.txt") || die; #process each address foreach $add (@adds){ $pm->start and next; foreach $x (@custom){ if ($add =~ $x){ writeaddr(*BADADDR, $add); #address is bad $pm->finish; } } #if email is invalid move on if (!defined(Email::Valid::Loose->address($add))){ writeaddr(*BADADDR, $add); #address is bad $pm->finish; } #if email is valid get domain name $is_valid = Email::Valid::Loose->address($add); if ($is_valid =~ m/\@(.*)$/) { $host = $1; } $is_valid=""; # perform dsn lookup to check domain @mx=mx($resolver, $host); if (@mx) { writeaddr(*GOODADDR, $add); #address is good }else{ writeaddr(*BADADDR, $add); #address is bad } $pm->finish; } $pm->wait_all_children; close (BADADDR); close (GOODADDR); #email results $msg = MIME::Lite->new( From =>'marketing@mydomain.com', To =>"$email", Subject =>'Validated email list', Type =>'multipart/mixed' ); $msg->attach(Type =>'TEXT', Data =>"Here are your email lists (good and bad)" ); $msg->attach(Type =>'text/plain', Path =>"/tmp/goodmails$num.txt", Filename =>"goodmails$num.txt", Disposition =>'attachment' ); $msg->attach(Type =>'text/plain', Path =>"/tmp/badmails$num.txt", Filename =>"badmails$num.txt", Disposition =>'attachment' ); $str = $msg->as_string; $msg->send; system("rm -f /tmp/goodmails$num.txt"); system("rm -f /tmp/badmails$num.txt"); #write to files but ensure proper locking sub writeaddr{ my $FH = $_[0]; my $address = $_[1]; flock $FH, LOCK_EX or die "Flock failed: $!\n"; seek $FH, 0, 2 or die "Seek failed: $!\n"; print $FH "$address\n"; flock $FH, LOCK_UN or die "unFlock failed: $!\n"; }

Neil Watson
watson-wilson.ca

2002-07-12 Edit by zdog: Added READMORE tag

Replies are listed 'Best First'.
Re: parallel forkmanager and unexpected duplicates.
by RMGir (Prior) on Jul 12, 2002 at 16:20 UTC
    Did you try setting STDOUT to autoflush?
    $|=1;
    I'm completely guessing, but if there's stuff in the STDOUT buffer, and then the child processes flush STDOUT on exit, that could explain the dupes...

    If that's the case, you should also make sure you select BADADDR and GOODADDR before the fork and do the same for them; I'm not sure if the unlock would do the flush for you.

    (Edit: The seek would do the flush, but the print happens AFTER the seek.)

    my $oldfh=select(BADADDR); $|=1; select(GOODADDR); $|=1; select($oldfh); # reselect STDOUT

    --
    Mike
      From the manual page of fork
                     Beginning with v5.6.0, Perl will attempt to flush
                     all files opened for output before forking the
                     child process, but this may not be supported on
                     some platforms (see the perlport manpage).  To be
                     safe, you may need to set "$|" ($AUTOFLUSH in
                     English) or call the "autoflush()" method of
                     "IO::Handle" on any open handles in order to avoid
                     duplicate output.
      

      Abigail

      Good guess!

      Adding $|=1; after my print to the browser did the trick.

      Thanks,

      Neil Watson
      watson-wilson.ca

Re: parallel forkmanager and unexpected duplicates.
by kvale (Monsignor) on Jul 12, 2002 at 16:39 UTC
    I'm no expert on Parallel::ForkManager, but the module author recommends this code structure:
    use LWP::Simple; use Parallel::ForkManager; ... @links=( ["http://www.foo.bar/rulez.data","rulez_data.txt"], ["http://new.host/more_data.doc","more_data.doc"], ... ); ... # Max 30 processes for parallel download my $pm = new Parallel::ForkManager(30); foreach my $linkarray (@links) { $pm->start and next; # do the fork my ($link,$fn) = @$linkarray; warn "Cannot get $fn from $link" if getstore($link,$fn) != RC_OK; $pm->finish; # do the exit in the child process } $pm->wait_all_children;
    with the declaration of a new object right before the for loop. Your code has it at the start of your program - perhaps this is causing the extraneous copies.

    -Mark