neilwatson has asked for the wisdom of the Perl Monks concerning the following question:

This script has started reporting errors when it was moved from a Redhat 6.2 box (Perl 5.6.1) to a Redhat 9.0 box (Perl 5.8.0). The script:
#!/usr/bin/perl #checks for valid email address #usage validemail <file containing email addresses> use warnings; use strict; use Email::Valid::Loose; use Net::DNS; use Parallel::ForkManager; use Fcntl qw/:flock :seek/; my $pm=new Parallel::ForkManager(20); my $resolver=Net::DNS::Resolver->new(); my $addrfile = $ARGV[0] || die "Usage validemail <file containing email addresses> Will return two files: goodmails.csv and badmails.csv. If these files exits already they will be deleted. "; my ($is_valid, $host, $x, @mx, $add, @adds); #custom words that make emails invalid to you my @custom = qw/ postmaster webmaster /; my $regex = join "|", @custom; $regex = qr/$regex/; open (EMAILS, "$addrfile"); #remove troublesome windows /r characters #and leading whitespace while (<EMAILS>){ $_ =~ s/\015//; $_ =~ s/^\s*//; chomp $_; push @adds, $_; } close (EMAILS); #warning, I will delete existing files open (BADADDR, ">badmails.csv") || die; open (GOODADDR, ">goodmails.csv") || die; #remove custom regexes $x = 0; while ($x <= $#adds){ if ($adds[$x] =~ m/$regex/){ splice @adds, $x, 1; }else{ $x++; } } foreach $add (@adds){ $pm->start and next; #if email is invalid move on if (!defined(Email::Valid::Loose->address($add))){ writeaddr(*BADADDR, $add); #address is bad $pm->finish; } #if email is valid get domain name $is_valid = Email::Valid::Loose->address($add); if ($is_valid =~ m/\@(.*)$/) { $host = $1; } $is_valid=""; # perform dsn lookup to check domain @mx=mx($resolver, $host); if (@mx) { writeaddr(*GOODADDR, $add); #address is good }else{ writeaddr(*BADADDR, $add); #address is bad } $pm->finish; } $pm->wait_all_children; close (BADADDR); close (GOODADDR); sub writeaddr{ my $FH = $_[0]; my $address = $_[1]; flock $FH, LOCK_EX or die "Flock failed: $!\n"; seek $FH, 0, 2 or die "Seek failed: $!\n"; print $FH "$address\n"; flock $FH, LOCK_UN or die "unFlock failed: $!\n"; }

The errors:

Unrecognised line: user1@foo.com at /usr/lib/perl5/site_perl/5.8.0/Ema +il/Valid.pm line 232 Unrecognised line: user2@foo.com at /usr/lib/perl5/site_perl/5.8.0/Ema +il/Valid.pm line 232 Unrecognised line: user3@foo.com at /usr/lib/perl5/site_perl/5.8.0/Ema +il/Valid.pm line 232

Anyone have any ideas as to the cause?

Neil Watson
watson-wilson.ca

update (broquaint): added <readmore> tags

Replies are listed 'Best First'.
Re: Email::Valid::Loose failing on new box?
by bbfu (Curate) on Aug 29, 2003 at 18:35 UTC

    The error you're getting originates from the _tokenise sub in Mail::Address, called from parse, which is called from address in Email::Valid (which is where line 232 is, at least in my copy).

    As far as I can tell, it means that your addresses don't match the regexps near the end of the _tokenise sub. Why they would suddenly stop matching, I can only guess. It could be that the data in the file you're reading in changed slightly when copied between boxes. Extra newlines, or newline problems, perhaps? It could be that the regexps used by Mail::Address have changed, and you had an older version of the module installed on the old box. You could check the source of the two copies of Mail::Address on the two machines, and compare the regexps.

    Hope that helps. Good luck.

    bbfu
    Black flowers blossom
    Fearless on my breath

Re: Email::Valid::Loose failing on new box?
by princepawn (Parson) on Aug 29, 2003 at 16:59 UTC
    1. Wow! Parallel::ForkManager is awesome. This post was worth a read if for no reason other than to find out about that module.
    2. Perhaps you could point out where line 232 is... nl can be used to number lines for you.
    3. I am including an email that I wrote to my IT department here but never sent out. Since you are trying to validate email addresses, you may find it useful:

    There is a set of email addresses which are valid, per RFC 2822, call this set A.

    The set of email addresses accepted by ISPs is ometimes a proper superset of set A and sometimes a proper subset of set A.

    For example, an AOL email address can only contain letters and numbers. There is no typo in my last sentence. I am not drunk or otherwise intoxicated. What you see is what AOL tech support told me.

    Example 2: goto Excite.com sometime when you have nothing to do and choose an email address like this: --perlhacker--@excite.com And lo and behold, it works!

    Conclusion: so we see that AOL accepts a proper subset of set A and Excite accepts a proper superset of set A. I cringe in fear at continuing my investigation with other ISPs.

    Even if an ISP accepts a proper superset of A, that does not mean that a mail sending program will send a mail addressed to such an email address...

    just see the email that bounced after I registered --metaperl--@excite.com... Excite says its fine, but our ISP's mailer barfs on it.

    This lengthy email is the result of staring at logs of rejected emails and wondering why people put certain things in the email field.

    We have a slew of people who put email.@aol.com (a dot just before the @ sign). We also have a healthy number of janee@yahoo..com (which is probably just a spelling error, but you never know). And of course, what would life be like without --roy--@excite.com (which is great for good ol' roy, but unacceptable to certain mailers).

    Conclusions

    1. certain email addresses should never have made it past the input form
    2. I have no choice but to only accept valid email addresses. Anything else is playing prophet
    ----- Original Message ----- From: "Mail Delivery System" <MAILER-DAEMON@lsh144.siteprotect.com> To: <tbone@directsynergy.com> Sent: Thursday, August 28, 2003 9:46 AM Subject: Undelivered Mail Returned to Sender > This is the Postfix program at host s1.mail-out.isp.lax.eggn.net. > > I'm sorry to have to inform you that the message returned > below could not be delivered to one or more destinations. > > For further assistance, please send mail to <postmaster> > > If you do so, please include this problem report. You can > delete your own text from the message returned below. > > The Postfix program > > <--metaperl--@excite.com>: invalid recipient syntax: "--metaperl--@excite.com" >

    Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality... download and use The Emacs Code Browser