Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^5: standard deviation accuracy question

by thor (Priest)
on Jul 23, 2005 at 23:19 UTC ( [id://477527]=note: print w/replies, xml ) Need Help??


in reply to Re^4: standard deviation accuracy question
in thread standard deviation accuracy question

Thanks much! There's a little hand waving going on here, but the explanation is good enough for me.

thor

Feel the white light, the light within
Be your own disciple, fan the sparks of will
For all of us waiting, your kingdom will come

  • Comment on Re^5: standard deviation accuracy question

Replies are listed 'Best First'.
Re^6: standard deviation accuracy question
by tlm (Prior) on Jul 24, 2005 at 00:45 UTC

    The best intuitive explanation I have come across for why dividing by N the sum of squared deviations from the sample mean underestimates the population variance is that the sample mean "follows" the sample; i.e. the sample almost always deviates from its own mean less than it deviates from the population mean (and it never deviates more). This is the source of the bias frodo72 alluded to.

    This intuitive argument only shows that simply taking the sample average of squared deviations from the sample mean will underestimate the population variance, but it does not at all prove that N/(N − 1) is the right correction factor. I don't know of an intuitive argument for this, but a nice rigorous derivation can be found here.

    the lowliest monk

Re^6: standard deviation accuracy question
by polettix (Vicar) on Jul 24, 2005 at 13:45 UTC
    This is no explaination, but it may help. When you concentrate on a sample instead of the entire population, you're doing two estimates: the mean and the variance (square of standard deviation). The issue is that when you estimate the variance, you're subtracting the estimated mean from each item in the sample: you're using an estimation inside another estimation.

    Which leads us to the concept of degree of freedom. The sample has N degrees of freedom, i.e. N possibility to be modified: you can have different values for each of the N items. Thus, when you estimate the mean value, you divide by N.

    When you estimate the variance, you're using the mean value evaluated over the sample, as said. Given the fact that you're implicitly trusting that mean value to be correct (otherwise you'd not use it to evaluate the variance!), you're stealing a degree of freedom. I mean: if you fix the value of the mean, you can move only (N-1) items, and the N-th will be bound to have a value that leads to the given mean value. Thus, a variance evaluated in this way only takes into account the variations brought by (N-1) items, not N.

    Hope that this intuitively helps :)

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://477527]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-23 13:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found