jxz has asked for the wisdom of the Perl Monks concerning the following question:

Hello.
I have a anti-spam control, basically lists of emails (whitelist and blacklist), a domain list, and now I implemented a regex-list.
When is a match in the regex with the email address, the message is considered spam.
It is done with:
if ($email =~ /^$re$/)
Everything is working, except a regex in my list to block emails not coming from \.(br|com|net|org)
The regex is: .+\.(?!br|com|net|org).
What am I doing wrong?
Thanks, and sorry my pig-english :-)

Replies are listed 'Best First'.
Re: Email suffix matching with a negative look-ahead regexp
by BazB (Priest) on Mar 02, 2003 at 14:44 UTC

    Rather than implementing an anti-spam filter yourself, have you looked at the fine selection of modules on CPAN?

    Mail::Audit and Mail::SpamAssassin make an extremely flexible and easy to use mail filtering toolset.

    Have a look at the various Mail:: modules for all your filtering needs.

    Cheers.

    BazB


    If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
    That way everyone learns.

Re: Email suffix matching with a negative look-ahead regexp
by dws (Chancellor) on Mar 01, 2003 at 23:41 UTC
    The regex is: .+\.(?!br|com|net|org)

    If you're just using () for grouping,

    .+\.(?:br|com|net|org)
    might serve you better, though that's a pretty heavy-handed way to deal with your email.

      dws: No no, he wants to reverse the test.

      The problem is that he uses a zero-width assertion, hence the \. is required to be at the end of the string.. so that obviously won't work

      jwx: basically your regex says the period may not be followed by 'br', 'com', 'net', or 'org' .. but you're also not permitting anything else to follow it. A solution, although not very pretty, would be to explicitly match a word after the period:

      .+\.(?!br|com|net|org)\w*

      The zero-width assertion will make sure that the word doesn't begin with 'br', 'com', 'net', or 'org'. I can't think of any top-level domains that begin with those but aren't equal to it, but if you're worried then this should work:

      .+\.(?!(?:br|com|net|org)$)\w*

      BTW, as dws says.. a pretty heavy-handed way to deal with your email.. (I'm in 'nl' myself, so if I'd email you it would be discarded as spam?)

      •Update: changed \w+ to \w* to discard addresses that end in a period as "spam"

        Thanks, it's working!

        junior:~$ perl -e 'die if "foo@spammer.tw"=~/^.+\.(?!br|com|net|org)\w ++$/' Died at -e line 1.

        OT: I receive much spam from international domains, and with this regexp the msg is sent to a special folder. I don't have problems, because the mailing-lists emails are filtered before.