*I'm not at all convinced that these two set of values are "more or less the same": (pvalues=,0.4869334,0.4961692,0.8251134,0.2657584,0.84692,0.1296479,0. +504212,0.9028209,0.8082847,0.2999607,0.154672,0.1660518,0.5143663,0.8 +120685,0.4452244,0.6561128,0.6123136,0.6994308,0.9302561,0.4757345) pvalues=,0.25,0.25,0.75,0.25,0.75,0.1,0.5,0.9,0.75,0.25,0.1,0.1,0.5,0. +75,0.25,0.5,0.5,0.5,0.9,0.25)
Right, so p-values sometimes are comparable, sometimes are not. R must be calculating p-values using some kind of monte-carlo. That's why they have the granularity you observed (as a ratio of success/total_trials or something). Whereas in Statistics::ChiSquare there is a lookup table column for each p-value (the 100%, 95 etc.) someone thought useful to include. And so the latter must return a p-value range for which the (correctly as far as I can see from reading diagonally the output) calculated statistic falls in. If the range in the table is small (e.g. 100-99) then p-values will more-or-less coincide (probably what I posted earlier).
If however the range is big (e.g. 50-25) then they will coincide only when R's calculated p-value is near the upper range. The cure for this is to use R's method for calculating p-vs which I guess is state-of-the-art. I don't think anyone wants to open that can of warms and start translating R recipes in pure perl. So, for me using once in a while Statistics::R for some obscure statistical methods is OK. For the record, all data was created in Perl and then was sent to R for just the chi-square test.
I agree with you also that chi-square may not be the perfect tool for this particular situation. It may well be (or accepted by statisticians) for others. At the end, one has to decide what they want out of a shuffle: destroy patterns in the sequence of the original dataset? Create random datasets (as you do where aim is all items to be equally present)? Or to split a mother set into smaller sets whose average properties approach the average of the mother set - for medical trials for example. Once the purpose is clear then a test can be used accordingly.
Being able to work out the efficacy of your shuffle is a useful thing but many don't bother with it. Let me know of your progress.
Oh, and thank you for providing some interesting diversion for a boring Sunday:)
likewise, though I did have a boring swim in between
In reply to Re^9: Last comment (and nail in the coffin of) of S::CS :)
by bliako
in thread Shuffling CODONS
by WouterVG
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |