Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Re: Re: From a SpamAssassin developer

by Elian (Parson)
on Aug 19, 2002 at 07:11 UTC ( [id://191100]=note: print w/replies, xml ) Need Help??

in reply to Re: Re: From a SpamAssassin developer
in thread Bayesian Filtering for Spam

Don't be too surprised that Paul's solution's not a good general-purpose one. His data set's probably quite small, with good locality, and odds are he made sure to skew his results to his data. It's not that his methods are bad for his needs, just that his needs are rather different than most people's.
  • Comment on Re: Re: Re: From a SpamAssassin developer

Replies are listed 'Best First'.
Re: Re: Re: Re: From a SpamAssassin developer
by Matts (Deacon) on Aug 19, 2002 at 16:24 UTC
    I'm not surprised. Not even slightly - see my original post.

    The biggest thing about statistical analysis is you simply cannot test it on the training data set. I get 100% accuracy when I do that. And it's not surprising. I'm speculating that's what PG did. But I could be wrong. And also the fact that the training often overfits. None of this is news to anyone versed in machine learning (which I'm starting to be ;-)


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://191100]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-14 11:44 GMT
Find Nodes?
    Voting Booth?

    No recent polls found