Re: Common RegExps
by merlyn (Sage) on Jul 31, 2000 at 22:30 UTC
|
The regex for a syntactically valid email address is about a full screenfull. Ditto for a URL. And IP addresses are better checked algorithmically, not statically as text.
So what problem are you really solving, for which you got to the step "use a regex". Perhaps you should back up a step. {grin}
-- Randal L. Schwartz, Perl hacker | [reply] |
|
|
Well, I was looking for something that parses xyz@somewhere.something.com. Rules such as valid characters in the username, and the ip-address, and also to get the TLD etc. What I did not know was that one can have nested structures in the email address, and that of course maikes it context free. Well, if thats the case, at least is there any module which checks for the validity of email addresses ?
mb
| [reply] |
|
|
| [reply] |
Re: Common RegExps
by BlaisePascal (Monk) on Jul 31, 2000 at 22:38 UTC
|
Is there a regex that recognises RFC822 email addresses? I'm not asking to see one, I'm asking if it is even possible! RFC822 is notoriously difficult. It wouldn't surprise me if it couldn't be done.
(For instance, doesn't RFC822 allow nested comments? If so, that would ruin it right there...)
| [reply] |
|
|
The 'owl' book (mastering regular expressions) is a great
text for questions like this, and my answer comes from it
(paraphrased):
No, you can't really recognize a valid email address with a
regex, because technically an email address can have arbitrarily
nested comments in parentheses, and a regular expression can
never recognize arbitrarily deep nested structures. When
you start talking about balanced constructs, you are out of
the land of regular languages and into the land of context free
languages.
I wonder if it would be useful or just unnecessary to have
native support for context free grammars in perl...
To recognize all valid email address that have less than or
equal to 1 level of comments requires something like a 5000
byte regular expression.
The moral of this story is that regex's can't do everything.
-Mark
| [reply] |
|
|
There's an index in Mastering Regular Expressions which is a regex for RFC822 addresses, well except for arbitrarailly nested comments... I think it uses a max of 5 levels or something along those lines. It takes somewhere around 5 pages, and is commented quite well, but it's still not something I'd ever want to have to build.
-Ted
| [reply] |
RE: Common RegExps
by t0mas (Priest) on Aug 01, 2000 at 02:51 UTC
|
| [reply] |