in reply to Spam filtering regexp - keyword countermeasure countermeasure

I am also using POPFile and I didn't notice that it failed to catch spam with "split" words. Most of the time it seems that there are sufficient other clues to categorize the e-mail as spam or not.

If it ain't broken, don't repair it!

However, as it is an interesting question, I would go along the following track:

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

  • Comment on Re: Spam filtering regexp - keyword countermeasure countermeasure

Replies are listed 'Best First'.
Re: Re: Spam filtering regexp - keyword countermeasure countermeasure
by shemp (Deacon) on May 12, 2003 at 21:15 UTC
    To help trying to avoid inadvertently collapsing a *legitimate* single character, you could maybe use the idea that the "funny" separator is the same between things that should be collapsed. (until the spammers catch on to that)
    It will be a not fun day when they do random grouping of chars in their headers, i.e.:
    viagra = V *IA *GR  **A
    But, ignoring that situation, get the separator, " *", for instance, and remove all occurences. just a thought

      Sorry to be the bearer of bad tidings, but I saw some spam a couple of days ago that compared its product to "vi ag r a".

      Hugo
        Ah yes, but would you buy V1agra? :)

        Saw that one too.

        There is no emoticon for what I'm feeling now.

Re: Re: Spam filtering regexp - keyword countermeasure countermeasure
by John M. Dlugosz (Monsignor) on May 12, 2003 at 19:15 UTC
    That sounds pretty good. Thanks for the pointer.