in reply to Email suffix matching with a negative look-ahead regexp

The regex is: .+\.(?!br|com|net|org)

If you're just using () for grouping,

.+\.(?:br|com|net|org)
might serve you better, though that's a pretty heavy-handed way to deal with your email.

Replies are listed 'Best First'.
Re: Re: Email suffix matching with a negative look-ahead regexp
by xmath (Hermit) on Mar 01, 2003 at 23:54 UTC
    dws: No no, he wants to reverse the test.

    The problem is that he uses a zero-width assertion, hence the \. is required to be at the end of the string.. so that obviously won't work

    jwx: basically your regex says the period may not be followed by 'br', 'com', 'net', or 'org' .. but you're also not permitting anything else to follow it. A solution, although not very pretty, would be to explicitly match a word after the period:

    .+\.(?!br|com|net|org)\w*

    The zero-width assertion will make sure that the word doesn't begin with 'br', 'com', 'net', or 'org'. I can't think of any top-level domains that begin with those but aren't equal to it, but if you're worried then this should work:

    .+\.(?!(?:br|com|net|org)$)\w*

    BTW, as dws says.. a pretty heavy-handed way to deal with your email.. (I'm in 'nl' myself, so if I'd email you it would be discarded as spam?)

    •Update: changed \w+ to \w* to discard addresses that end in a period as "spam"

      Thanks, it's working!

      junior:~$ perl -e 'die if "foo@spammer.tw"=~/^.+\.(?!br|com|net|org)\w ++$/' Died at -e line 1.

      OT: I receive much spam from international domains, and with this regexp the msg is sent to a special folder. I don't have problems, because the mailing-lists emails are filtered before.

        OT, a good way to test patterns like these is:
        perl -lne 'print /^.+\.(?!br|com|net|org)\w+$/ ? "spam" : "ok"'

        Every line of input will be tested, so you can try out various email addresses to see if it matches properly

        Maybe you already knew the -n option, but I thought I'd mention it in case you don't