Translation of reg expression

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Translation of reg expression by maa (Pilgrim) on Dec 31, 2003 at 15:33 UTC
Hi, this looks like a shoddy attempt at validating an email address... if you're interested in validating email addresses read through the node validate an email address which directs you to several CPAN modules that do this. If you just want a simple explanation of `.+@.+\..+` it says: any character (except a newline) (one or more) followed by @ (although it should be \@ a plain @ would generate warnings/errors) followed by any character (not a newline) (one or more) followed by a literal '.' followed by any character (not a newline) (one or more) See perldoc perlre for details. HTH - Mark	[reply] [d/l]
Re: Translation of reg expression by tcf22 (Priest) on Dec 31, 2003 at 15:26 UTC
`.+ - Any character 1 or more times @ - A literal @ .+ - Any character 1 or more times \. - A literal . .+ - Any character 1 or more times` [download] .+ is any character 1 or more times, not just alphanumeric characters. - Tom	[reply] [d/l]
Re: Translation of reg expression by hardburn (Abbot) on Dec 31, 2003 at 15:33 UTC
Looks like a broken attempt at parsing an e-mail address. Broken because: Not just any character can be part of an e-mail address (though there are more characters allowed than most people think). Domain names don't necessarilly have a dot in them. More then one '@' character is allowed in an e-mail address to specify a relay. The regex you have would match it, but wouldn't get the correct user (the greediness of `+` would put the relay as part of the user portion). (To be fair, this syntax is usually disallowed due to the spammer-friendly nature of this mis-feature). The accepted regex for parsing an e-mail address is several hundred characters long, and it doesn't even match embedded comments. See Email::Valid. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Translation of reg expression by exussum0 (Vicar) on Dec 31, 2003 at 15:42 UTC
There's a regular expresion that works "well" on email addresses in the perl cookbook, which what this regexp seems to attempt to try and do. Unfortunately, doing it solely based off of a regexp would miss some specialty addresses, since the specs for various mail systems allows some really funky stuff as the perl cookbook points out. So it's prolly a really low priority bug that the regexp in question is even there. I don't have my copy next to me, but check it out. It should be near the back somewhere if you have free time to learn why it's a bad regexp and why the one they provide isn't the end all of regexp's and email. Play that funky music white boy..	[reply]
Re: Re: Translation of reg expression by Anonymous Monk on Dec 31, 2003 at 17:04 UTC
Thanks all!	[reply]
Re: Translation of reg expression by hanenkamp (Pilgrim) on Dec 31, 2003 at 15:41 UTC
First, I would suggest consulting the appropriate documentation: perlrequick, perlretut, and perlre. Now, the period will typically match any character except newline ("`\n`"). This includes all punctuation. Therefore, it looks like the expression should match any email address, but it won't. The problem is that the first `.+` will greedily gobble up everything and then try to match `@` (which would have been gobbled up if it were present) and return false. A better solution might be: `[^@]+@[^\.]+\..+` This will now match any email address you feed it as the first `[^@]+` matches one or more of anything that's not an `@`. Then, it matches the `@`. Next, it matches any number of characters that aren't periods. Then, the period. Finally, all other characters. This still is not what you want. This will also match other non-email type strings, such as: `!@#$%^.&*` If you really want to match an email containing only alphanumerics, then `\w` is probably what you are looking for. It matches any Perl word character and is essentially equivalent to `[0-9a-zA-Z_]`. So, to match one or more alphanumerics followed by an `@` and then one or more alphanumerics followed by an `.` and then one or more alphanumerics. Try: `\w+@\w+\.\w+` This, however, is too stringent as email addresses may legally contain many other characters besides alphanumerics, might contain multiple periods after the `@`, etc.	[reply] [d/l] [select]
Re: Re: Translation of reg expression by hardburn (Abbot) on Dec 31, 2003 at 15:49 UTC
. . . might contain multiple periods after the @, etc. Or no periods at all. Take a look at the regex in the Email::Valid module. (If you stare at it long enough, it starts to form a picture like those Magic Eye posters that were popular a few years back). ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l]
Re: Translation of reg expression by Anonymous Monk on Jan 01, 2004 at 02:03 UTC
E:\>perl -MYAPE::Regex::Explain -le"print YAPE::Regex::Explain->new('. ++@.+\..+')->explain" The regular expression: (?-imsx:.+@.+\..+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- .+ any character except \n (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- @ '@' ---------------------------------------------------------------------- .+ any character except \n (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- .+ any character except \n (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l]