in reply to Re: (OT) Fighting spam
in thread (OT) Fighting spam
You may think about Paul Graham whatever you want, but he's not that stupid. I am continuously surprised that people don't seem to get how and why Bayesian filtering is works so effectively for old-fashioned (more on that in a bit) spam.
Let me ask once more: how likely do you deem "M0n-stur" to be in legitimate mail? How likely is it in spam? And what is the ratio of these probabilities? Now, how likely is "Monster" to be in legitimate mail? How likely is it in spam? And what is the ratio of these probabilities?
Result: "M0n-stur" only appears in mails that are spam. "Monster" appears in mail that is probably around 30-80% spam, depending on your specific mail traffic. This means you do not want to map the variation back to "monster". The presence of a variation is almost a dead give-away of spam.
This is why naive Bayesian filtering works as well as it does for spam so far, despite being naive.
This extreme effectivity of Bayesian filters against obfuscated variations of keywords has prompted spammers to move on beyond variations. They are now circumphrasing, and not mentioning viagra, monster rods or whatever it is they're advertising at all.
I am now occasionally getting mail along the lines of
Subject: I never thought I'd see better days
I was really in a bind until I found this, and now I can even afford to live carelessly. Believe me, it works.
There is absolutely nothing in there that any kind of content based filter could pick out, unless it were to actually understand the message.
This is why content based filtering is a dead end. Most of the things you describe will only fool rule based filters; statistical filters, a family of which Bayes is just one member, will pick them up reliably. But they cannot comprehend the message; hence spam such as what I outlined above, and which tachyon and Andy Lester observed as well.
Makeshifts last the longest.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re^2: (OT) Fighting spam (naive, but not *that* naive)
by forrest (Beadle) on Nov 18, 2003 at 03:23 UTC | |
|
Re: (OT) Fighting spam (naive, but not *that* naive)
by jonadab (Parson) on Nov 18, 2003 at 14:02 UTC |