neilwatson has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that validates a list of email addresses. In order to save time I have the script running several forks.
#!/usr/bin/perl #checks for valid email address #usage validemail <file containing email addresses> use warnings; use strict; use Mail::CheckUser; use Parallel::ForkManager; use Fcntl qw/:flock :seek/; my $pm=new Parallel::ForkManager(20); my $addrfile = $ARGV[0] || die "Usage validemail <file containing email addresses> Will return two files: goodmails.csv and badmails.csv. If these files exits already they will be deleted."; my ($is_valid, $host, $x, @mx, $add, @adds); #custom words that make emails invalid to you my @custom = qw/ postmaster webmaster /; my $regex = join "|", @custom; $regex = qr/$regex/; open (EMAILS, "$addrfile"); #remove troublesome windows /r characters #and leading whitespace while (<EMAILS>){ $_ =~ s/\015//; $_ =~ s/^\s*//; chomp $_; push @adds, $_; } close (EMAILS); #warning, I will delete existing files open (BADADDR, ">badmails.csv") || die; open (GOODADDR, ">goodmails.csv") || die; #remove custom regexes $x = 0; while ($x <= $#adds){ if ($adds[$x] =~ m/$regex/){ splice @adds, $x, 1; }else{ $x++; } } #when using Mail::UserCheck set #these variables #timeout on DNS and SMTP network checks $Mail::CheckUser::Timeout = 10; foreach $add (@adds){ $pm->start and next; if (Mail::CheckUser::check_email($add)){ writeaddr(*GOODADDR, $add); #address is good $pm->finish; }else{ writeaddr(*BADADDR, $add); #address is bad $pm->finish; } } $pm->wait_all_children; close (BADADDR); close (GOODADDR); sub writeaddr{ my $FH = $_[0]; my $address = $_[1]; flock $FH, LOCK_EX or die "Flock failed: $!\n"; seek $FH, 0, 2 or die "Seek failed: $!\n"; print $FH "$address\n"; flock $FH, LOCK_UN or die "unFlock failed: $!\n"; }

The script works well except for one problem. Occasionally, a line in one of the address files will be written incomplete. For example instead of writing the address myname@foo.org, it will write oo.org. My thought is that it has something to do with flock but I'm not sure how. Does anyone have any ideas?

Neil Watson
watson-wilson.ca

Replies are listed 'Best First'.
Re: Strange flock results?
by hv (Prior) on Apr 27, 2004 at 13:16 UTC

    In the writing routine:

    flock $FH, LOCK_EX or die "Flock failed: $!\n"; seek $FH, 0, 2 or die "Seek failed: $!\n"; print $FH "$address\n"; flock $FH, LOCK_UN or die "unFlock failed: $!\n";
    at the point you release the lock the printed address will usually still be in the output buffer in memory, not written to disk.

    See 'perldoc -q flush' for various ways to ensure that the output is flushed to disk before you release the lock.

    Hugo

Re: Strange flock results?
by sgifford (Prior) on Apr 27, 2004 at 15:49 UTC
    It may be an issue with the current file location being at the wrong place, in which case seeking to the end will help. This example is from Perl's documentation for flock:
    sub lock { flock(MBOX,LOCK_EX); # and, in case someone appended # while we were waiting... seek(MBOX, 0, 2); }
    You could also try opening the file in append mode, which instructs the OS to take care of this for you. If you want to truncate the file immediately after opening it, you can use truncate:
    open (BADADDR, ">>badmails.csv") || die; truncate (BADADDR,0) || die; open (GOODADDR, ">>goodmails.csv") || die; truncate (GOODADDR,0) || die;
Re: Strange flock results?
by eserte (Deacon) on Apr 27, 2004 at 13:17 UTC
    Maybe output buffering is the problem here. Try to set $| or autoflush to a true value on your file handles.
      Ever since at least 5.004, perl flushes filehandles before (un)locking a file. See perl5004delta.pod.

      Abigail