In addition to what thargas++ wrote: A very simple approach is to allow anything that has at least one arbitary character left of the rightmost @, and at least one . surrounded by at least one arbitary character on each side on the right hand side of the rightmost @. A trailing . is allowed, a . right after the rightmost @ is not.
Rationale:
- The relevant RFCs allow nearly everything left of the rightmost @, and while most people use email addresses like joe.user@example.com, some people have more complex email addresses. Some mail servers allow using a part of the email address to help sorting mails. (joe.user+pizza@example.com and joe.user+pasta@example.com both get delivered to joe.user@example.com, but are automatically sorted into the pizza resp. pasta folders.) Many people have names that do not match your local idea of what a name is, so their mail address won't match your local idea of what a mail account is. The recent RFCs allow Unicode (UTF-8), so expect umlauts, accented characters, japanese, chinese, arabic, and many other letters in the account part of the mail address. In summary: don't restrict the left-hand side.
- Rules for domains change, as more and more TLDs are invented. Restricting domains or even just TLDs to some list or character set won't work in the long run.
- Domains may contain unicode (encoded using punycode)
- Most times, if not always, you do not want to deliver to local computers, but to computers somewhere else. So the domain part must contain at least one dot. Some people still use an IPv4 address in their mail address, this is also matched by the "dot surrounded by any characters" rule.
- A trailing dot is allowed in the domain part. In fact, most people omit it because DNS resolving usually does the right thing. Adding a trailing dot makes absolutely clear where the mail has to be delivered.
- A leading dot in the domain part is not valid.
- And my favorite one: Comments are allowed in both account and domain part of the email address.
These simple rules will allow almost all email addresses. IPv6 addresses right of the rightmost "@" won't work, due to the "one dot required" rule. So you may want to relax that rule or extend it to require at least one dot or colon instead of just one dot.
Of course, these simple rules allow a lot of false email addresses. You have to live with that. Your systems should already be able to handle undeliverable emails. Even syntactically valid email addresses may become undeliverable some day. People change their job or their mail provider, so the old email address will no longer be used or may be deleted.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.