DrHyde has asked for the wisdom of the Perl Monks concerning the following question:
Given that I trust the original data to be random, I still need to be sure that what I'm doing to the data isn't biassing it.
- my algorithm sucks
- an off-by-one error
The question is, then, how to test that my output data is nice and random? I initially thought of using Jon Orwant's Statistics::ChiSquared module, but that has a couple of big drawbacks:
- it thinks a coin that throws 500 heads followed by 500 tails is just fine and dandy;
- it's limited to 21 discrete values because of the way its implemented
- can determine whether data is evenly and randomly distributed across its range and is equally evenly distributed regardless of which part of the sample i look at (ie the first 20 values should be just as random as the next 100); and
- can determine whether the data is at all predictable (ie can it detect that if the die rolls a 1 it's likely to roll a 4 three rolls later, or if it rolls a 1 it won't roll a 1 next time)
I'm not aware of anything on CPAN that can do that. An alternative would be - and we can do this because I'm only concerned about whether *I* am introducing bias, not with whether the data is biassed - to check that the distribution of my results is the same as the distribution of the original data. But I'm not aware of anything to do that either.
So, can anyone point me at any appropriate modules? Or at an algorithm that I could turn into a module?