in reply to On Validating Email Addresses

Validating email addresses with some regular expression is over-rated. You can only check if the email address is well-formed (not if it exists), but as your .name example shows, if you are too zealous here, you get into trouble by rejecting too much.

If someone wants to not give you his real email address, he can just type mickey.mouse@microsoft.com which would be fine for your validator routine. If someone mistypes his email address by accident, the chance that your validator can catch that is very slim as well.

If you need to validate an email address, the only way to do that is to send an email to that address and wait for a reply. So for form validations, it does not make sense to check more than that the string contains an @ and at least one dot after that.

if ( /\@.+\./ ) { # email looks good }

Replies are listed 'Best First'.
•Re^2: On Validating Email Addresses
by merlyn (Sage) on Jan 04, 2005 at 03:04 UTC
    the string contains an @ and at least one dot after that
    Even that's not quite right (see how easy this is to get wrong!).

    One of the country-code registrars (I forget which now) has addresses at the top-level domain! Like "foo@to" for the ".to" registrar.

    So please, don't look for a dot. Stick with the Email::Valid-style validators.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      One of the country-code registrars (I forget which now) has addresses at the top-level domain! Like "foo@to" for the ".to" registrar.

      Wow. I am shocked ;-) Have to rewrite some code now...

      if (/\@/) { # email looks valid... }
Re^2: On Validating Email Addresses
by hardburn (Abbot) on Jan 04, 2005 at 14:58 UTC

    merlyn already noted the TLD problem. But really, you're now being too generous. The real solution is to use Email::Valid, which contains a very large and complex regex, plus a few other validation routines.

    As complex as that regex is, it still won't match embedded comments in the address, but that's usually not a problem.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      What?! Email::Valid fails on embedded comments? That's an astonishingly common feature of actual email addresses in the wild. I managed a number of public inboxes for a global corporation for a few years and I had to take special care in my own email address parsing code (in a VB dialect) to handle comments.

      I mean, of the form (Fname Lname) <addr@example.com> and <addr@example.com> (Fname Lname). I never saw addr@example( ... ).com. Of those three forms, which are supported? Anything good will handle the first two and I don't think the third matters. I'm speaking only from what I saw in actual usage.

        s/embedded/nested/g

        The regex doesn't handle comments nested inside of comments. It does handle comments (one level deep only).

        - tye        

        Besdies tye's point below, I don't think it matters much in common usage of Email::Valid, anyway. I've only used it for validating form input, and I imagine this tends to be the most common case. How often do you type (Fname Lname) <addr@example.com> into a form? I always just type the address alone.

        "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      But really, you're now being too generous. The real solution is to use Email::Valid, which contains a very large and complex regex, plus a few other validation routines.

      Well, my point was that you cannot validate the email with a regular expression anyway. You are very unlikely to even catch typos. If my email is bill@microsoft.com and I mistype it as bikk@microsoft.com how is Email::Valid going to help you? So why bother at all?

      Concession: Email::Valid can also check if an MX entry exists for the domain. That might make sense in some situations (but it still does not check the user name -- is there a way to do this, too?)

        As you suggested, about the best possible check that you can hope to perform for the purposes of catching typos is to ask the user to type it twice.

        The problem with that is that I always copy&paste when I'm asked to do that, so if I type it incorrectly the first time, it just gets confirmed incorrectly.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
        Ah, but even checking the MX is fraught with danger. What if the name servers are offline for the moment, or the local nameservers are not working, or even that no local nameserver is configured due to security rules? Wait 30 seconds to go to the secondary? Return it as invalid? It's probably much better to send some sort of cookie to the e-mail address to continue, if having an e-mail address really is important.

        As an aside, + is valid in the username portion of the e-mail address, and I try to use it regularly, I really do. However, the only form I've found so far that actually accepts it (without causing problems) is the mailman interface.