Re: regexp to only allow for formally valid email addresses
by Zaxo (Archbishop) on Mar 07, 2007 at 17:50 UTC
|
use Email::Valid;
# . . .
my $spammer;
$spammer = 1
unless Email::Valid->address($f{'Email'})
or $f{'Email'} eq '';
There exists a regex that does what you want, but it is large and complex. Email::Valid uses a small parser.
Update: Seperated my declaration from conditional assignment, was a thinko.
| [reply] [d/l] |
|
There exists a regex that does what you want, but it is large and complex.
In another post I mentioned an "impressive example" that works with the newest blead, posted by Abigail in clpmisc; for completeness, I'm pasting it hereafter:
| [reply] [d/l] |
Re: regexp to only allow for formally valid email addresses
by Fletch (Bishop) on Mar 07, 2007 at 17:51 UTC
|
See Mail::RFC822::Address which has "the" regex for valid addresses. Right off I see that yours has one of the common problems that tend to tick me off, specifically disallowing "foo+identifier@example.com" style addresses (which lets me have one "foo@example.com" address but give out different "+identifier" tags to different people so I can label/tag/filter/toss accordingly).
Update: Also see RFC::RFC822::Address for a Parse::RecDescent based parser rather than a regex.
| [reply] |
Re: regexp to only allow for formally valid email addresses
by vrk (Chaplain) on Mar 07, 2007 at 17:53 UTC
|
$ perl -e 'print "valid\n" if ("foo\@bar" =~ m/^[a-zA-Z_\-.0-9]+@[a-zA
+-Z_\-.0-9]+$/);'
valid
$ perl -e 'print "valid\n" if ("j.random.hacker\@perlmonks.com" =~ m/^
+[a-zA-Z_\-.0-9]+@[a-zA-Z_\-.0-9]+$/);'
Of course, tests can never show the absence of errors. But I'm willing to bet you have a problem somewhere else in the program.
UPDATE: Seems like others beat me to it... I just remembered that there was some nice discussion about this over at The Daily WTF.
| [reply] [d/l] |
|
| [reply] |
Re: regexp to only allow for formally valid email addresses
by ikegami (Patriarch) on Mar 07, 2007 at 23:03 UTC
|
In Perl 5.10, you'll be able to do
my $email_address = qr{
(?(DEFINE)
(?<addr_spec> (?&local_part) \@ (?&domain))
(?<local_part> (?&dot_atom) | (?"ed_string))
(?<domain> (?&dot_atom) | (?&domain_literal))
(?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?&
+FWS)?
\] (?&CFWS)?)
(?<dcontent> (?&dtext) | (?"ed_pair))
(?<dtext> (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e])
(?<atext> (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}
+~])
(?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?)
(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)
(?<dot_atom_text> (?&atext)+ (?: \. (?&atext)+)*)
(?<text> [\x01-\x09\x0b\x0c\x0e-\x7f])
(?<quoted_pair> \\ (?&text))
(?<qtext> (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e])
(?<qcontent> (?&qtext) | (?"ed_pair))
(?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent
+))*
(?&FWS)? (?&DQUOTE) (?&CFWS)?)
(?<word> (?&atom) | (?"ed_string))
(?<phrase> (?&word)+)
# Folding white space
(?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
(?<ctext> (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e
+])
(?<ccontent> (?&ctext) | (?"ed_pair) | (?&comment))
(?<comment> \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) )
(?<CFWS> (?: (?&FWS)? (?&comment))*
(?: (?:(?&FWS)? (?&comment)) | (?&FWS)))
# No whitespace control
(?<NO_WS_CTL> [\x01-\x08\x0b\x0c\x0e-\x1f\x7f])
(?<ALPHA> [A-Za-z])
(?<DIGIT> [0-9])
(?<CRLF> \x0d \x0a)
(?<DQUOTE> ")
(?<WSP> [\x20\x09])
)
(?&addr_spec)
}x;
Disallowing CR & LF would simply be a matter of changing
(?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
to
(?<FWS> (?&WSP)+)
Credit: The regexp was written by Abigail, who also wrote RFC::RFC822::Address.
| [reply] [d/l] [select] |
Re: regexp to only allow for formally valid email addresses
by Moron (Curate) on Mar 07, 2007 at 19:59 UTC
|
I agree with the suggestions to use a ready-made regexp. But if for some reason I had to reinvent one, I'd extend the \w token as much as necessary rather than go from scratch, something like... $spammer = length($f{'Email'})
&& $f{'Email'} !~ /^(^\w|\-|\.)+\@(\w|\-|\.)+$/;
| [reply] [d/l] |
|
Word characters (\w) might include local characters like German Ä ö ü ß on a German webserver. Although these Umlaut characters should be, in theory, valid in email adresses (in my interpretation of RFC 822), I know from experience that their occurence in email addresses usually causes problems sooner or later. At least one German provider (T-Online) used to allow for those chars, but I would rather disallow and have the user enter an email address which is safe for international use.
| [reply] |
Re: regexp to only allow for formally valid email addresses
by hangon (Deacon) on Mar 07, 2007 at 19:25 UTC
|
You need to escape the dots in your regex.
Update: Nevermind this post. I stand corrected and learned Yet Another Perl Nuance. Thanks Thelonius & Fletch.
| [reply] |
|
$ perl -le '$_ = "oh really?"; print unless /[.]/;'
oh really?
| [reply] [d/l] |
|
# my guess is that he's not trying to do this
=~ /^[.]+@[.]+$/
# either of these make more sense for matching an e-mail address
=~ /^[a-zA-Z_\-\.0-9]+@[a-zA-Z_\-\.0-9]+$/
=~ /^[\w\-\.]+@[\w\-\.]+$/
Or am I missing something painfully obvious? | [reply] [d/l] |
|