Stopping excessive periods in email

htmanning has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Stopping excessive periods in email by pryrt (Abbot) on Jul 16, 2021 at 19:09 UTC
First, it's a patently bad idea: my active gmail address has two dots before the @ (think first.middle.last@gmail or similar), and for years, I had an @alumni.collegenamehere.edu alternate email address (until they stopped providing email for alumni). So I personally have had at least two valid, non-spam emails that had more dots than your rules allow. (That's as annoying as the institutions who don't allow my email with a hyphen in the username, so I've had to create an alias email that doesn't include the hyphen.) Second, you should be able to test it yourself. In this example, I use Test::More `unlike` (which will have the test pass if it doesn't match the regex, and fail if it does) and throw a bunch of emails at the regex, to see which ones would be "good" emails and which would be "bad". If you have other outlier emails you wanted to test, you could add more emails into the test list #!perl use 5.012; # strict, // use warnings; use Test::More; for (qw/ test@gmail.com test.test@gmail.com test.test.test@gmail.com test.test.test.test@gmail.com test@subdomain.domain.com test.test@subdomain.domain.com test.test.test@subdomain.domain.com test@test@domain.example test,test@domain.example test@first.example,test@domain.example /, "contains\nnewline\@fake.address") { unlike $_, qr/\@.\@\|,\|\..\..\.\|\n/i, "test '$_'"; } done_testing; __END__ Possible attempt to separate words with commas at C:\Users\peter.jones +\Downloads\TempData\perl\pm.pl line 18. ok 1 - test 'test@gmail.com' ok 2 - test 'test.test@gmail.com' not ok 3 - test 'test.test.test@gmail.com' # Failed test 'test 'test.test.test@gmail.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test.test@gmail.com' # matches '(?^ui:\@.\@\|,\|\..\..\.\|\n)' not ok 4 - test 'test.test.test.test@gmail.com' # Failed test 'test 'test.test.test.test@gmail.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test.test.test@gmail.com' # matches '(?^ui:\@.\@\|,\|\..\..\.\|\n)' ok 5 - test 'test@subdomain.domain.com' not ok 6 - test 'test.test@subdomain.domain.com' # Failed test 'test 'test.test@subdomain.domain.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test@subdomain.domain.com' # matches '(?^ui:\@.\@\|,\|\..\..\.\|\n)' not ok 7 - test 'test.test.test@subdomain.domain.com' # Failed test 'test 'test.test.test@subdomain.domain.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test.test@subdomain.domain.com' # matches '(?^ui:\@.\@\|,\|\..\..\.\|\n)' not ok 8 - test 'test@test@domain.example' # Failed test 'test 'test@test@domain.example'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test@test@domain.example' # matches '(?^ui:\@.\@\|,\|\..\..\.\|\n)' not ok 9 - test 'test,test@domain.example' # Failed test 'test 'test,test@domain.example'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test,test@domain.example' # matches '(?^ui:\@.\@\|,\|\..\..\.\|\n)' not ok 10 - test 'test@first.example,test@domain.example' # Failed test 'test 'test@first.example,test@domain.example'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test@first.example,test@domain.example' # matches '(?^ui:\@.\@\|,\|\..\..\.\|\n)' not ok 11 - test 'contains # newline@fake.address' # Failed test 'test 'contains # newline@fake.address'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'contains # newline@fake.address' # matches '(?^ui:\@.\@\|,\|\..\..*\.\|\n)' 1..11 # Looks like you failed 8 tests of 11. [download] So in my examples, 3 were "good" emails and the other 8 were "bad". Since you are the one who is defining what is and isn't a valid email (I disagree with your rules, obviously; but that is irrelevant to the technical perl question of whether you are filtering the emails that you want to filter), only you can decide whether your regex filters enough of them or not.	[reply] [d/l] [select]
Re: Stopping excessive periods in email by haukex (Archbishop) on Jul 16, 2021 at 20:42 UTC
As others have mentioned, adding a single rule to your checks is unlikely to be an effective way of stopping spam - you may want to consider captchas, established software to prevent spam, and so on, instead of trying to roll your own. Having said, that, regexes are also not really a good way to parse email addresses, which is why I would use Email::Address; you may also be interested in Email::Valid (which isn't perfect either, but still better than hand-rolled checks). use warnings; use strict; use Email::Address; sub countdots { my @addrs = Email::Address->parse(shift); die "expected exactly one address" unless @addrs==1; my $x = (my $tmp = $addrs[0]->user) =~ tr/././; my $y = ($tmp = $addrs[0]->host) =~ tr/././; return wantarray ? ($x,$y) : $x+$y; } use Test::More; my @tests = ( ['test@gmail.com', 0, 1], ['test.test@gmail.com', 1, 1], ['test.test.test@gmail.com', 2, 1], ['test.test.test.test@gmail.com', 3, 1], ['test.test.test.test.test@gmail.com', 4, 1], ['test@foo.bar.com', 0, 2], ['test.test@foo.bar.com', 1, 2], ); plan tests => 2*@tests+1; for my $t (@tests) { my ($ad, $exp_x, $exp_y) = @$t; my ($got_x, $got_y) = countdots($ad); is $got_x, $exp_x; is $got_y, $exp_y; } is countdots($tests[0][0]), $tests[0][1]+$tests[0][2]; [download]	[reply] [d/l]
Re: Stopping excessive periods in email by marto (Cardinal) on Jul 16, 2021 at 19:05 UTC
Have you addressed the issues raised here?	[reply]
Re: Stopping excessive periods in email by LanX (Saint) on Jul 16, 2021 at 19:45 UTC
4 things You'd be better off testing against a blacklist of "rules" (regexes or even better subs) , your approach with or'ed regexes won't be maintainable. You'll also need a quality management telling you which rule rejected which email address, at least a log If that's still your formmailer, attackers will quickly learn how to beat your rules. Who told you I'm Russian? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re: Stopping excessive periods in email by Marshall (Canon) on Jul 16, 2021 at 18:57 UTC
Why don't you generate some test cases on your own, then post the results of those test cases running through your code? Show pass/fail for each test case and id areas where your regex fails (if it does). I am not sure how well this will accomplish your end goal of rejecting bad email addresses. But start with thinking through some test cases. See if you can find actual "seen in the wild" actual bogus addresses.	[reply]
Re: Stopping excessive periods in email by Anonymous Monk on Jul 17, 2021 at 13:01 UTC
Russian hackers <...> bogus email addresses <...> more than 2 dots in an email string Please don't do that, you'd be blocking legitimate addresses too. Best regards, an owner of an e-mail address that goes like `${firstname}.${lastname}@${division}.${department}.msu.ru`	[reply] [d/l]
Re: Stopping excessive periods in email by Anonymous Monk on Jul 17, 2021 at 13:56 UTC
I'm not sure your regex does what you say it does. You say it looks for two at signs and then a comma and then the periods. As I read it, it looks for two at signs OR a comma OR periods OR a return. So any string that has one of these will match. Specifically, the following match: `' me @ myself @ I '` `'Hello, sailor!'` `'Fee. Fie. Foe. Fum.'` `"\n"`	[reply] [d/l] [select]
Re^2: Stopping excessive periods in email by htmanning (Friar) on Jul 20, 2021 at 19:19 UTC
That's correct. That's what I'm after. Thanks!	[reply]


"be consistent"
	PerlMonks