Flame has asked for the wisdom of the Perl Monks concerning the following question:

I was building a more advanced (than the average ones I've found) e-mail verification regex, when I started running into an odd error.

The regex: m/^\w*?\@[a-z0-9.]*?\.(?-i:[a-z]){2,4}\z/i

The odd thing is that while it matches on you@here.com and other real addresses, when I place it into my program, it fails. I've managed to trace the error down to the
use CGI qw(:cgi);
line, but I can't understand what the problem might be. (Located by commenting out that line, testing, and testing again with it un-commented)

Anyone know what the problem might be?

Thanks



My code doesn't have bugs, it just develops random features.

Flame ~ Lead Programmer: GMS

Replies are listed 'Best First'.
•Re: Regex Unexplained Failure
by merlyn (Sage) on Aug 03, 2002 at 23:33 UTC
    Not commenting on the rest, but here's where your's is broken for sure:
    ... \w*? ...
    Please. Please no. Please dear gahd no. That's not what a valid email address even begins to look like.

    Please follow the FAQ on this. Do not go off into useless territory.

    -- Randal L. Schwartz, Perl hacker

      Ok, so while I now understand why that wouldn't work for all real addys, any idea why it's failing with CGI? Even if I'm not going to use this, and go with a module or something like Email::Valid, I'd still like to know what I'm doing wrong to cause it to fail.



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS

Re: Regex Unexplained Failure
by larsen (Parson) on Aug 03, 2002 at 23:38 UTC
    Rolling your own email addressess validator is generally considered a bad idea. Just look at the 6.5k regexp written by Jeffrey Friedl (that is used in Email::Valid), or dig into the code of Abigail-II's RFC::RFC822::Address (which uses a different tecnique based on Parse::RecDescent).

    Anyway, I don't have anything against reinventing wheels in order to learn. This leads to your question. Which I can't reply :) You just say that your RE fails when put into your cgi. It would be nice to know what is the string used in the matching, what was the desired output, and what was the actual output. And a small, but not smaller, piece of code that shows how your regex is used in its context.

      This is the code I was using to test it:

      use GMS; #Own package, which in turn uses CGI, it was in there that I +was comenting out different lines to see if I could find the error while(chomp(my $temp = <>)){ print "Match\n" if($temp =~ m/^\w*?\@[a-z0-9.]*?\.(?-i:[a-z]){2,4}\ +z/i); }

      I would run it and enter an assortment of addresses, to see if they could work. Usually, just test@you.com while I was trying to locate the bug. Interestingly enough, it works fine when I use
      print "Match\n" if('test@you.com' =~ m/^\w*?\@[a-z0-9.]*?\.(?-i:[a-z]) +{2,4}\z/i);


      Though I can't imagine why it would operate differently just because of 'use CGI qw(:cgi);'...



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS

      Here's another problem, now that I've managed to solve the dillema with why it was only sometimes failing. I'm stuck with ActiveState on a Win32 system, and Email::Valid does not want to run there. PPM refuses because there is no valid package for Win32, and I can't seem to get CPAN working. (I've been trying that off and on for the past year now... 10 different make programs and none of them work...)

      Any suggestions?



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS

        Have you tried Microsoft's nmake? I haven't had any problems with nmake in the past and wouldn't think you would either.

        To install Email::Valid, just download the gzip file from CPAN and check out the README, but it is probably just do the usual perl Makefile.PL/(n)make/(n)make test/(n)make install. Since this is a pure Perl module, there shouldn't be any other problems. The only problem I could see right now is that Email::Valid requires Mail::Address. Just make sure that it is installed before you try to install Email::Valid.

Re: Regex Unexplained Failure
by Cody Pendant (Prior) on Aug 04, 2002 at 00:16 UTC
    What merlyn said.

    This regex wouldn't match my email address for a start.

    it matches on you@here.com and other real addresses
    but it wouldn't match "john.smith@johnsmith.com", would it?

    And if you're thinking "yes it would!", then you might like to check what the \w means in regexes. I spent a lot of time struggling when I first started learning Perl because I thought my concept of a word and Perl's were the same. They aren't.
    --

    ($_='jjjuuusssttt annootthheer pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;
      Ahh, well that's because of a personal oversight, I didn't realize '.' was valid in the name itself, otherwise I would have included it... (Ie: I've never seen an addy with a dot before the @ before..., my bad...)

      I never claimed to be perfect, just good enough... usually...



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS

Re: Regex Unexplained Failure
by dws (Chancellor) on Aug 04, 2002 at 00:53 UTC
    I've managed to trace the error down to the use CGI qw(:cgi);
    One of the unexpected side-effects of using CGI is that it invokes binmode() on STDIN, STDOUT, and STDERR. If somewhere upstream of the regex you assume "\n" instead of "\r\n", then something may be getting thrown off.

      Chomp only removes the last char, doesn't it? so it's keeping the \n part?

      Flame goes off to test it.



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS

Re: Regex Unexplained Failure
by december (Pilgrim) on Aug 04, 2002 at 03:56 UTC

    Hmm... That regexp looks broken, i.e. it doesn't work in valid situations (as others already told you).

    I use something like:

    /^[a-z0-9-_.]+@[a-z0-9-.]*[a-z0-9]\.[a-z]{2,4}$/i
    ... for simple checks. This just requires to have something 'legal' in front of the '@', a basic check for valid chars in the domain name (A-Z0-9 + underscore, except first and last character can not be an underscore), and then 2-4 chars for the tld. This doesn't really catch all invalid situations, but if you would really want to do that, you might as well resolve the mx/a records for the domain given to check if it exists, and then even try to do a VRFY/EXP check on the mailserver. I'm just saying, if people want to give a fake address, they will, even if it's just that of someone else :)


       wouter