in reply to Bayesian Filtering for Spam
Mr. Graham goes on to show a few lines of Lisp:Let me try to help you on your way.that do the calculations (indecipherable to me, I grew up in Perl)(...)(let ((g (* 2 (or (gethash word good) 0))) (b (or (gethash word bad) 0))) (unless (< (+ g b) 5) (max .01 (min .99 (float (/ (min 1 (/ b nbad)) (+ (min 1 (/ g ngood)) (min 1 (/ b nbad)))))))) and then . . . (let ((prod (apply #'* probs))) (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x)) probs)))))I immediately dived into the page to figure out the 'magic formula' that I can start building a spam filter around . . . and basically sat drooling at the screen for 20 minutes trying to grok what was being talked about.
starting with the weirdest critters: "mapcar" is in Lisp what map() is in Perl. The first atom is quoted so it isn't executed immediately. "lambda" defines an anonymous function (here with one parameter, x). And "apply" in Lisp is what some perlers may know as reduce(). There's been extensive talk about it in the Perl6 RFC's, and the library List::Util implements it for people who'd like to use it with current day perls. Heh: the same library implements min() and max() as well. Good. So let's use that.
Here's an attempt at a literal conversion into Perl:
There. That wasn't so hard, was it? But it's just a starting point, though. You won't filter any spam with it just like that, just yet.use List::Util qw(min max reduce); sub score { my($word) = @_; # uses global %good, %bad, $ngood, $nbad my $g = 2 * ($good{$word} || 0); my $b = $bad{$word} || 0; unless($g + $b < 5) { return max(0.01, min (0.99, min (1, $b / $nbad)/ (min(1, $g / $ngood) + min (1, $b / $nbad)))); } # otherwise: return undef } sub prob { my @probs = @_; my $prod = reduce { $a * $b } @probs; return $prod / ($prod + reduce { $a * $b } map { 1 - $_ } @probs); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Bayesian Filtering for Spam
by oakbox (Chaplain) on Aug 17, 2002 at 22:37 UTC |