comment on

*I'm not at all convinced that these two set of values are "more or less the same": (pvalues=,0.4869334,0.4961692,0.8251134,0.2657584,0.84692,0.1296479,0. +504212,0.9028209,0.8082847,0.2999607,0.154672,0.1660518,0.5143663,0.8 +120685,0.4452244,0.6561128,0.6123136,0.6994308,0.9302561,0.4757345) pvalues=,0.25,0.25,0.75,0.25,0.75,0.1,0.5,0.9,0.75,0.25,0.1,0.1,0.5,0. +75,0.25,0.5,0.5,0.5,0.9,0.25)

Right, so p-values sometimes are comparable, sometimes are not. R must be calculating p-values using some kind of monte-carlo. That's why they have the granularity you observed (as a ratio of success/total_trials or something). Whereas in Statistics::ChiSquare there is a lookup table column for each p-value (the 100%, 95 etc.) someone thought useful to include. And so the latter must return a p-value range for which the (correctly as far as I can see from reading diagonally the output) calculated statistic falls in. If the range in the table is small (e.g. 100-99) then p-values will more-or-less coincide (probably what I posted earlier).

If however the range is big (e.g. 50-25) then they will coincide only when R's calculated p-value is near the upper range. The cure for this is to use R's method for calculating p-vs which I guess is state-of-the-art. I don't think anyone wants to open that can of warms and start translating R recipes in pure perl. So, for me using once in a while Statistics::R for some obscure statistical methods is OK. For the record, all data was created in Perl and then was sent to R for just the chi-square test.

I agree with you also that chi-square may not be the perfect tool for this particular situation. It may well be (or accepted by statisticians) for others. At the end, one has to decide what they want out of a shuffle: destroy patterns in the sequence of the original dataset? Create random datasets (as you do where aim is all items to be equally present)? Or to split a mother set into smaller sets whose average properties approach the average of the mother set - for medical trials for example. Once the purpose is clear then a test can be used accordingly.

Being able to work out the efficacy of your shuffle is a useful thing but many don't bother with it. Let me know of your progress.

Oh, and thank you for providing some interesting diversion for a boring Sunday:)

likewise, though I did have a boring swim in between

In reply to Re^9: Last comment (and nail in the coffin of) of S::CS :) by bliako
in thread Shuffling CODONS by WouterVG

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.