Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Stopping excessive periods in email

by htmanning (Friar)
on Jul 16, 2021 at 18:48 UTC ( [id://11135077]=perlquestion: print w/replies, xml ) Need Help??

htmanning has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I'm trying to stop bogus Russian hackers who use bogus email addresses. I'm trying to detect more than 2 dots in an email string. For example, test.test@gmail.com is fine, but test.test.test.test@gmail.com is not. The following seems to work, but I wonder if someone can validate what I have. I first detect if there are two at signs that signal they are trying to bcc others, and then I look for a comma, and then the periods. Is this right?

if ($sender =~ /\@.*\@|,|\..*\..*\.|\n/i) {
Thanks.

Replies are listed 'Best First'.
Re: Stopping excessive periods in email
by pryrt (Abbot) on Jul 16, 2021 at 19:09 UTC
    First, it's a patently bad idea: my active gmail address has two dots before the @ (think first.middle.last@gmail or similar), and for years, I had an @alumni.collegenamehere.edu alternate email address (until they stopped providing email for alumni). So I personally have had at least two valid, non-spam emails that had more dots than your rules allow. (That's as annoying as the institutions who don't allow my email with a hyphen in the username, so I've had to create an alias email that doesn't include the hyphen.)

    Second, you should be able to test it yourself. In this example, I use Test::More unlike (which will have the test pass if it doesn't match the regex, and fail if it does) and throw a bunch of emails at the regex, to see which ones would be "good" emails and which would be "bad". If you have other outlier emails you wanted to test, you could add more emails into the test list

    #!perl use 5.012; # strict, // use warnings; use Test::More; for (qw/ test@gmail.com test.test@gmail.com test.test.test@gmail.com test.test.test.test@gmail.com test@subdomain.domain.com test.test@subdomain.domain.com test.test.test@subdomain.domain.com test@test@domain.example test,test@domain.example test@first.example,test@domain.example /, "contains\nnewline\@fake.address") { unlike $_, qr/\@.*\@|,|\..*\..*\.|\n/i, "test '$_'"; } done_testing; __END__ Possible attempt to separate words with commas at C:\Users\peter.jones +\Downloads\TempData\perl\pm.pl line 18. ok 1 - test 'test@gmail.com' ok 2 - test 'test.test@gmail.com' not ok 3 - test 'test.test.test@gmail.com' # Failed test 'test 'test.test.test@gmail.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test.test@gmail.com' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' not ok 4 - test 'test.test.test.test@gmail.com' # Failed test 'test 'test.test.test.test@gmail.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test.test.test@gmail.com' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' ok 5 - test 'test@subdomain.domain.com' not ok 6 - test 'test.test@subdomain.domain.com' # Failed test 'test 'test.test@subdomain.domain.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test@subdomain.domain.com' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' not ok 7 - test 'test.test.test@subdomain.domain.com' # Failed test 'test 'test.test.test@subdomain.domain.com'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test.test.test@subdomain.domain.com' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' not ok 8 - test 'test@test@domain.example' # Failed test 'test 'test@test@domain.example'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test@test@domain.example' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' not ok 9 - test 'test,test@domain.example' # Failed test 'test 'test,test@domain.example'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test,test@domain.example' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' not ok 10 - test 'test@first.example,test@domain.example' # Failed test 'test 'test@first.example,test@domain.example'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'test@first.example,test@domain.example' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' not ok 11 - test 'contains # newline@fake.address' # Failed test 'test 'contains # newline@fake.address'' # at C:\Users\peter.jones\Downloads\TempData\perl\pm.pl line 20. # 'contains # newline@fake.address' # matches '(?^ui:\@.*\@|,|\..*\..*\.|\n)' 1..11 # Looks like you failed 8 tests of 11.

    So in my examples, 3 were "good" emails and the other 8 were "bad".

    Since you are the one who is defining what is and isn't a valid email (I disagree with your rules, obviously; but that is irrelevant to the technical perl question of whether you are filtering the emails that you want to filter), only you can decide whether your regex filters enough of them or not.

Re: Stopping excessive periods in email
by haukex (Archbishop) on Jul 16, 2021 at 20:42 UTC

    As others have mentioned, adding a single rule to your checks is unlikely to be an effective way of stopping spam - you may want to consider captchas, established software to prevent spam, and so on, instead of trying to roll your own.

    Having said, that, regexes are also not really a good way to parse email addresses, which is why I would use Email::Address; you may also be interested in Email::Valid (which isn't perfect either, but still better than hand-rolled checks).

    use warnings; use strict; use Email::Address; sub countdots { my @addrs = Email::Address->parse(shift); die "expected exactly one address" unless @addrs==1; my $x = (my $tmp = $addrs[0]->user) =~ tr/././; my $y = ($tmp = $addrs[0]->host) =~ tr/././; return wantarray ? ($x,$y) : $x+$y; } use Test::More; my @tests = ( ['test@gmail.com', 0, 1], ['test.test@gmail.com', 1, 1], ['test.test.test@gmail.com', 2, 1], ['test.test.test.test@gmail.com', 3, 1], ['test.test.test.test.test@gmail.com', 4, 1], ['test@foo.bar.com', 0, 2], ['test.test@foo.bar.com', 1, 2], ); plan tests => 2*@tests+1; for my $t (@tests) { my ($ad, $exp_x, $exp_y) = @$t; my ($got_x, $got_y) = countdots($ad); is $got_x, $exp_x; is $got_y, $exp_y; } is countdots($tests[0][0]), $tests[0][1]+$tests[0][2];
Re: Stopping excessive periods in email
by marto (Cardinal) on Jul 16, 2021 at 19:05 UTC

    Have you addressed the issues raised here?

Re: Stopping excessive periods in email
by LanX (Saint) on Jul 16, 2021 at 19:45 UTC
    4 things

    • You'd be better off testing against a blacklist of "rules" (regexes or even better subs) , your approach with or'ed regexes won't be maintainable.
    • You'll also need a quality management telling you which rule rejected which email address, at least a log
    • If that's still your formmailer, attackers will quickly learn how to beat your rules.
    • Who told you I'm Russian?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: Stopping excessive periods in email
by Marshall (Canon) on Jul 16, 2021 at 18:57 UTC
    Why don't you generate some test cases on your own, then post the results of those test cases running through your code? Show pass/fail for each test case and id areas where your regex fails (if it does). I am not sure how well this will accomplish your end goal of rejecting bad email addresses. But start with thinking through some test cases. See if you can find actual "seen in the wild" actual bogus addresses.
Re: Stopping excessive periods in email
by Anonymous Monk on Jul 17, 2021 at 13:01 UTC
    Russian hackers <...> bogus email addresses <...> more than 2 dots in an email string

    Please don't do that, you'd be blocking legitimate addresses too.

    Best regards, an owner of an e-mail address that goes like ${firstname}.${lastname}@${division}.${department}.msu.ru

Re: Stopping excessive periods in email
by Anonymous Monk on Jul 17, 2021 at 13:56 UTC

    I'm not sure your regex does what you say it does. You say it looks for two at signs and then a comma and then the periods. As I read it, it looks for two at signs OR a comma OR periods OR a return. So any string that has one of these will match. Specifically, the following match:

    • ' me @ myself @ I '
    • 'Hello, sailor!'
    • 'Fee. Fie. Foe. Fum.'
    • "\n"
      That's correct. That's what I'm after.

      Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135077]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-19 03:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found