Mr. Graham goes on to show a few lines of Lisp:
(let ((g (* 2 (or (gethash word good) 0))) (b (or (gethash word bad) 0))) (unless (< (+ g b) 5) (max .01 (min .99 (float (/ (min 1 (/ b nbad)) (+ (min 1 (/ g ngood)) (min 1 (/ b nbad)))))))) and then . . . (let ((prod (apply #'* probs))) (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x)) probs)))))
that do the calculations (indecipherable to me, I grew up in Perl)(...)

I immediately dived into the page to figure out the 'magic formula' that I can start building a spam filter around . . . and basically sat drooling at the screen for 20 minutes trying to grok what was being talked about.

Let me try to help you on your way.

starting with the weirdest critters: "mapcar" is in Lisp what map() is in Perl. The first atom is quoted so it isn't executed immediately. "lambda" defines an anonymous function (here with one parameter, x). And "apply" in Lisp is what some perlers may know as reduce(). There's been extensive talk about it in the Perl6 RFC's, and the library List::Util implements it for people who'd like to use it with current day perls. Heh: the same library implements min() and max() as well. Good. So let's use that.

Here's an attempt at a literal conversion into Perl:

use List::Util qw(min max reduce); sub score { my($word) = @_; # uses global %good, %bad, $ngood, $nbad my $g = 2 * ($good{$word} || 0); my $b = $bad{$word} || 0; unless($g + $b < 5) { return max(0.01, min (0.99, min (1, $b / $nbad)/ (min(1, $g / $ngood) + min (1, $b / $nbad)))); } # otherwise: return undef } sub prob { my @probs = @_; my $prod = reduce { $a * $b } @probs; return $prod / ($prod + reduce { $a * $b } map { 1 - $_ } @probs); }
There. That wasn't so hard, was it? But it's just a starting point, though. You won't filter any spam with it just like that, just yet.

In reply to Re: Bayesian Filtering for Spam by bart
in thread Bayesian Filtering for Spam by oakbox

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.