There's a good article over at perl.com about stopping spam with SpamAssassin. From the article...
SpamAssassin is a rule-based spam identification tool. It's written in Perl, and there are several ways of using it: You can call a client program, spamassassin, and have it determine whether a given message is likely to be spam; you can do essentially the same thing but use a client/server approach so that your client isn't always loading and parsing the rules each time mail comes; or, finally, you can use a Perl module interface to filter spam from a Perl program.

Other interesting parts include using Mail::Audit and some information on Vipul's Razor which is a "distributed, collaborative, spam detection and filtering network." I also particularly enjoyed $spamtest->report_as_spam($mail); which reports a piece of spam to Vipul's Razor.

Replies are listed 'Best First'.
(shockme) Re: Stopping Spam with SpamAssassin
by shockme (Chaplain) on Mar 09, 2002 at 14:47 UTC
    There's a tutorial on using these modules in the Tutorials section. jcwren's comments are particularly insightful.

    If things get any worse, I'll have to ask you to stop helping me.

Re: Stopping Spam with SpamAssassin
by shotgunefx (Parson) on Mar 09, 2002 at 14:51 UTC
    I read it. Looks good. I was thinking about Razor though. It says if spaces are varying, it will stop the mail from matching. I was thinking that why don't they normalize the spacing? Any sequence of whitespace becomes one whitespace and all chars become lower or uppercase. It would make it harder for spammers to slip by.
    Now if you could just figure out a easy heuristic to see if a word token looked like an MD5 or SHA string, it would be even harder. Any thoughts?

    -Lee

    "To be civilized is to deny one's nature."
      Spaces is not the main problem with Razor. The real problem is that it reports too many false postivites. For example I have seen it marking a lot of "good" emails in Bugtraq maillist as spam.

      --
      Ilya Martynov (http://martynov.org/)

        1) Even if it normallized spaces, it still wouldn't be able to deal with the random characters/digits that many spammers put at the end of msgs (or sometimes in the subjects) ... so there's not much point in normallizing the spaces.

        2) As I understand it, the problem with false positives is acctually a problem with people poisoning it. It won't call something "spam" unless someone says it's spam, and there are a lot of assholes out there who think it's ammusing to call widely subscribed lists "spam"

        The article makes a good suggestion, don't trust Razor outright, use it as a SpamAssasin rule that contributes to a messages score.