tanger has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

ahhh another email validation question.
First I just want to say that I did search on this topic and read all the possibilities. I understand that there is no complete regex to actually determine whether an e-mail address is valid/deliverable.

I'm making a simple news letter script and want to verify / taint the e-mail input field. I'm planning to use a validation code authentication method, where it will send a email to the subscriber asking him to click a link to validate his subscription.

The problem here is that since I'm using this "validation link" method, I only want to verify that an e-mail address is typed properly (I really don't want to use two e-mail address fields). That being said, I don't really care about if the e-mail address is actually deliverable. However, if someone decides to subscribe with his e-mail and the e-mail ends up bouncing then he would never receive the validation code. The thing with bounced e-mails is that an valid working e-mail can still bounce due to a error such as full mailbox. If the e-mail does bounce, is there anyway I can make my perl script notify the subscriber through html? --

In the end, I'm really leaning towards using Email::Valid w/out the DNS lookup.

Should I be choosing any other options/route? Perhaps Email::Valid::Loose is even better?

Thanks!! tanger

Replies are listed 'Best First'.
Re: Yet Another E-mail Validation Question
by jasonk (Parson) on Apr 20, 2005 at 22:35 UTC

    Basically what you are asking is "can I find out if an email address is valid without sending email to it." The short answer is no.

    The best you can do is send out the validation mail blind, and then have the form tell them to expect it and if that if they don't get it there is probably an email problem. In theory you could watch for bounces and then notify the person who submitted the form that it bounced, but they would have to stay on the page, or check back periodically, for a very long time. For example, if the mail can't be delivered because all their mail servers are down, it could take days to bounce, most mail servers will keep retrying for 4 days in this circumstance.


    We're not surrounded, we're in a target-rich environment!
      Okay, I'm a little confused now.

      Does Email::valid module actually send out an e-mail to the user from a blind like you just said?

      Also, I've been hearing a lot about Email::valid on this board, but no one seems to recommend using Email::Valid::Loose. Doesn't it make more sense since it can verify a whole entire range of e-mail addresses? I know a few people who have e-mail addresses that containt "." afront of the "@" , ex. joe.smith@comcast.com

      ty! tanger

        Email::Valid does not actually attempt delivery. From its documentation:

        This module determines whether an email address is well-formed, and optionally, whether a mail host exists for the domain.

        Please note that there is no way to determine whether an address is deliverable without attempting delivery (for details, see perlfaq 9).

        So, let's start today's SMTP lesson:

        1. When you send an e-mail message, you check to see if there's an MX record in DNS for the domain. If there isn't, you check to see if there's an A record in DNS.
        2. You then attempt to connect to port 25 of whatever host you found.
        3. Once you connect, you send HELO host.domain.tld (I won't get into EHLO (extended hello) at this time).
        4. In the old days (before spammers started abusing it), you could use VRFY or EXPN to test an e-mail address. But it's a rare chance of that happening these days. So, we have to send a message. You start by telling who the message is being sent from: MAIL FROM: sender@your.domain.tld
        5. In the next line, you tell who it's going to: RCPT TO: recipient@host.domain.tld
        6. And then you send you message. You send the line DATA and then the header of your message, a blank line, the body of the message, and then, on a line by itself, a single period.
        7. You then disconnect, by sending QUIT (unless you have more mail for that domain, in which case, you go back to 'MAIL FROM:').

        Depending if the address is valid, you may get a response back from the mail server immediately (or you may not find a host, or be able to connect to port 25, letting you know something's wrong). It's also possible that the server you connected to has no idea about the final mail delivery, and is just there for load balancing, virus scanning, spam filtering, or something else where it doesn't know at this point that the recipient is over quota.

        (of course, you wouldn't actually do all that by hand, I just needed to explain the steps... you should use Net::SMTP or one of the other mail sending modules in CPAN ... see How do I send e-mail from my Perl Program?)

        If the email is found to be undeliverable, it will send a message to the envelope-from. (that's the email address in the 'MAIL FROM' line, which doesn't necessarily need to be the same as the 'From' header.) So, you have to send out the messages with a valid e-mail address as the recipient from, and then have some sort of process to connect to that account (see Net::POP3 or Net::IMAP::Simple) and check for errors. (or spam collecting up). You could also use something to process the mail on the mail server, potentially (see What're some good ways of notifying a perl script of an incoming email?)

        So um...you'd probably do better just reading perlfaq9

        Update: I should probably mention -- most e-mail servers will attempt to deliver messages to servers that are down. (ie, you can't connect on port 25). You might get a message that it's having problems after a few hours, but you can't depend on it. Sendmail (which is so widespread we'll use as our standard) sends its final 'I've given up' message after 5 days. You also run the risk of the e-mail being dropped by any server, thinking that your message is spam, or a virus, or just because of problems. So remember -- it can take almost a week to go through, and you can't be sure that you'll get an error message if it doesn't go through.

        Also, I've been hearing a lot about Email::valid on this board, but no one seems to recommend using Email::Valid::Loose.

        I suspect that's because most e-mail addresses are genuinely valid, and Email::Valid is fine for most circumstances.

        Doesn't it make more sense since it can verify a whole entire range of e-mail addresses?

        According to the Email::Valid::Loose docs the kind of invalid addresses it accepts are often used by Japanese telcos for e-mailing mobile phones. They don't seem to be common anywhere else, so I guess most people (especially those not running Japanese-targetted websites) don't worry about this.

        I know a few people who have e-mail addresses that containt "." afront of the "@" , ex. joe.smith@comcast.com

        The dot in that address is valid, because there is "smith" between it and the at sign. It's very easy to check that Email::Valid does indeed accept it as valid (and hence that Email::Valid::Loose isn't necessary to allow it through):

        $ perl -wMEmail::Valid -le "print scalar Email::Valid->address('joe.smith@example.com')"

        Whereas the sample invalid address from the Email::Valid::Loose docs does not validate with Email::Valid:

        $ perl -wMEmail::Valid -le "print scalar Email::Valid->address('read_rfc822.@docomo.ne.jp')"

        Apparently Email::Valid::Loose would allow that through, but I've never felt the desire to check: putting a dot directly before the at sign seems completely pointless to me, and since it violates the standard I'm not going out of my way to assist anybody doing this.

        Smylers

Re: Yet Another E-mail Validation Question
by johnnywang (Priest) on Apr 21, 2005 at 01:30 UTC
    As others have pointed out, the short answer is no. Without sending an email, you can't tell for sure whether an email adddress is valid (all email validation modules are to eliminate those that are for sure not valid.) Furthermore, even after you send an email, you still can't be sure whether it's valid, certainly not while the user is waiting. This is for many reasons, for one, it can take an indefinite amount of time to deliver/process it; second, some/many email servers will remain silent even when an address is not deliverable (to fight spam).
Re: Yet Another E-mail Validation Question
by TedPride (Priest) on Apr 21, 2005 at 02:12 UTC
    EDIT: Rewritten slightly to make my meaning clearer.

    I would use a combination of double email fields (it isn't as irritating as you think), server-run email verification with hash link, and user-run email verification via hash code included in the body of his email. The latter method would be described under "Didn't get your verification email?" and would basically consist of the user emailing a specified hash code to your verification address from the email the user signed up with. All you need is a script that can run through the contents of your email box every so often, matching email addresses and hash codes to unverified user accounts.

    Of course, if your email system blocks their email as well, then they're stuck. But this isn't nearly as likely.

      The latter method would be described under "Didn't get your verification email?" and would basically consist of the user emailing a specified hash code to your verification address from the email the user signed up with.

      I don't like this method too much. Mainly because from a personal mail server I can send a mail with "From:" equal any mail if I want, so with this method I could subscribe and confirm any user I want, and I think this is not a very good practice.

      The thing with bounced e-mails is that an valid working e-mail can still bounce due to a error such as full mailbox. If the e-mail does bounce, is there anyway I can make my perl script notify the subscriber through html?

      It depends on the configuration of the remote mailserver. Maybe it's throwing 4xx error when user mailbox is full and 5xx when user doesn't exist, or maybe it's accepting anything and sending DSNs, or maybe... I think you shoul only send the mail, and then the user should worry about why it didn't got delivered.

Re: Yet Another E-mail Validation Question
by Anonymous Monk on Apr 21, 2005 at 08:37 UTC
    I understand that there is no complete regex to actually determine whether an e-mail address is valid/deliverable.
    What's the obsession of people to check for valid email syntax with a regular expression? Checking whether a given email address has a valid syntax (which isn't the same as being deliverable) isn't too hard - it takes a bit of grunt work, but the grammar of RFC (2)822 is pretty straightforward. There are even some CPAN modules out there that do this.

    But doing it with a regex is pretty hard. Yet, people seldomly ask how to check whether an email address is valid - instead, most of them insist on a regex.

Yet another comment on Yet Another E-mail Validation Question
by RatArsed (Monk) on Apr 21, 2005 at 11:48 UTC

    In short: "It can't be done"

    To elaborate; The only way you can tell reliably if an email was recieved, is to see if it is responded to. This also covers the case of people entering other users email addresses, too.

    To do all of this in the timescale of a HTTP request just isn't going to happen.

    My suggestion, as I've implemented before, is do everything in your best interests to see if it's worth trying to send an email (is it syntactically correct, is there an MX record, etc) and email a validation link with a secret in it. -- This seems to be the sort of thing you were suggesting.

    In my experience, end users do twig when the email they've asked for hasn't arrived and sign up again with the email address corrected -- maybe you could look at writing an auto-bounce handler later to flag a warning if they try with that email address again?

Re: Yet Another E-mail Validation Question
by gloryhack (Deacon) on Apr 21, 2005 at 14:26 UTC
    I've been using Mail::CheckUser for quite a while, and haven't had any complaints. It provides regex address syntax checking, with optional DNS and/or SMTP checks, and so far I haven't heard a complaint from a user that his valid email address has been rejected.

    I usually just warn the user that the check could take up to 30 seconds, and do the full DNS and SMTP checks while the poor slob waits around. In cases where the wait is considered unacceptable, I just do the DNS checks when the form is submitted (catching things like "aol.cmo"), and then weed the mailing list with the SMTP checks later. qmail and postfix are the main culprits in later bounces, but they're fewer and further between than they would be if we just blindly mailed to all of the syntactically valid addresses.

    Shouldn't your SMTP server be retrying on full mailboxes and other temporary (4xx) errors?

      What is the point of wasting those 30 seconds checking if the email is in a valid domain? Anyone can type george@aol.com and pass your check. I'm not trying to be a pain here, I realy just wonder if there is some other benefit to this method of checking that I have missed. If you arn't checking that it is actualy their email address then it woudl seem there is little point in checking if it is an email address at all.


      ___________
      Eric Hodges

        Checking that the domain is valid, and, to the extent possible, that the email address exists, saves network and server resources that would be otherwise be consumed by rejections from the local SMTP server and bounces from remote SMTP servers.

        Of course anyone who's of a mind to be a pest can sign a disliked person up for a bajillion mailing lists -- that's why a well-designed list server will provide for double opt-in to confirm subscriptions. That's not the problem we're trying to solve here, though, is it?