in reply to Unexpected under-dispersion in random simulations

I think that your range testing is suspect.

  1. The test for the inverted range should at least be: ( $_->[1] < $_->[0] and $_->[1] >= $some_pos ) }

    That is, the test should be that the end is less than or equal to $some_pos

  2. Also, whilst the test works for your current arbitrary chosen position, 36, it would fail when that arbitrary position is greater than 900.

    Ie. If $some_pos = 950; then a range of [0, 901] would fail:

    $some_pos = 950; print +( $_->[1] < $_->[0] and $_->[1] >= $some_pos ) ? 'contained' : 'not contained' for [0, 901];; not contained

However, neither of those affect the discrepancy you note.

I also thought that using a better rand() might change things, but as you note, Math::Random::MT shows the same discrepancy.

Then I thought perhaps the difference might be due to sample .v. population variance; and discovered that there appears to be an undocumented parameter to the variance() method that might be intended to affect the calculation:

sub variance { my $self = shift; ##Myself my $div = @_ ? 0 : 1; ... $variance /= $count - $div;

Ie. If you pass any parameter(s) to the variance method, it appears (I think) to calculate the population rather than the sample variance? Update: the variations are the biased versus unbiased estimates of variance.

But as expected, whilst it does make some slight difference, it is very small:

[21:59:37.22] C:\test>874353 10000: 10.003 9.138 10.0028 9.13839215999999 [21:59:44.34] C:\test>874353 10000: 10.003 9.139 10.0028 9.13930609060905

At this point, I see two possibilities that I don't have answers to, or the knowledge to verify:

Not much help, but it might trigger some thoughts somewhere.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: Unexpected under-dispersion in random simulations
by daverave (Scribe) on Nov 30, 2010 at 08:34 UTC
    While I accept you first correction (>= instead of >), I don't understand the second point. 950 is really not contained in 0,901.

      Sorry. I swapped the ends around when posting. That should [901,0], but the point still stands:

      $some_pos = 950; print +( $_->[1] < $_->[0] and $_->[1] >= $some_pos ) ? 'contained' : 'not contained' for [901,0];; not contained

      A range of 100 starting at 901 will wrap around to end at position 0, thereby covering position 950; but your test fails to detect it.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.