Random partitions?

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Random parttitions? by hdb (Monsignor) on May 03, 2015 at 12:49 UTC
Keywords to look for are stratification and quasi-random numbers. A simple method would be to partition your interval into `n` smaller parts and then generate `count/n` random numbers in each subinterval. In the extreme, you would generate one random number in each of `count` subintervals.	[reply] [d/l] [select]
Re: Random parttitions? by RichardK (Parson) on May 03, 2015 at 13:16 UTC
what about this? no sort and a somewhat adjustable distribution. `sub parts { my ($start,$end,$count,$max) = @_; my $v = $start; for my $i (1..$count) { my $r; do { $r = $v + rand($max); } until $r < $end; $v = $r; printf "%.2f, ",$v; } print "\n"; } for (0..10) { parts(0,10,10,2.5); }` [download] `0.90,1.01,1.84,2.11,3.56,3.74,4.57,5.26,6.47,7.00, + + 1.43,1.89,3.00,4.48,6.45,7.86,9.82,9.96,9.96,9.98, + + 0.90,1.58,1.86,2.47,3.34,5.31,5.56,5.75,6.50,8.25, + + 0.90,2.88,3.44,3.93,5.69,7.20,8.74,9.31,9.84,9.89,` [download] It can give you repeated values, particularly at the end. (... 10.0, 10.0) but that's more an issue of rounding than anything else. Of course if you set the max to large then it could take a long time to finish.	[reply] [d/l] [select]
Re^2: Random parttitions? by hdb (Monsignor) on May 03, 2015 at 14:07 UTC
As an extension of your idea, one could just do `$count` steps starting from zero, and then map the result to the desired interval. `use strict; use warnings; sub parts { my ($start, $end, $count) = @_; my @steps = (0); push @steps, rand()+$steps[-1] for 1..$count; my $scale = ($end-$start)/$steps[-1]; $_ *= $scale, $_ += $start for @steps; return @steps; } my @parts = parts 0, 10, 10; print "@parts\n";` [download] This, of course, is not able to generate skewed distributions, so less flexibility.	[reply] [d/l] [select]
Re: Random partitions? (Thanks and solution.) by BrowserUk (Patriarch) on May 04, 2015 at 00:15 UTC
Thanks to hdb, RichardK & pme. Based on ideas from your code I've derived two routines: The first follows the basic schema of picking $count evenly distributed points offset by $step/2 and adding rand( $step ) - $step/2; but except for the first and last, it extends the random adjustment to rand( $step * 2 ) - $step, which allows for occasional overlaps (out of order points): `sub rndPart1{ ## can overlap my( $start, $end, $count ) = @_; my $step = ( $end - $start ) / $count; $start += $step / 2; map{ my $mid = $start + $step$_; my $v = $mid + ( ( $_ == 0 \|\| $_ == ( $count - 1 ) ) ? ( rand( $step ) - $step/2 ) : ( rand( $step 2 ) - $step ) ); sprintf "%5.2f", $v; } 0 .. $count-1; }` [download] Some samples: 0 .. 59 / 10 3.90 14.52 13.91 22.28 26.68 31.21 40.83 49.55 55.17 57.06 3.87 12.36 12.22 26.06 25.67 31.14 37.76 47.42 52.80 55.30 3.74 8.34 20.48 20.20 22.88 26.66 39.94 47.00 47.97 53.26 4.44 12.14 11.63 18.69 31.22 30.80 42.69 39.36 48.64 57.32 3.04 12.60 17.57 16.92 27.94 30.26 42.05 41.96 45.39 57.34 5.74 3.39 12.46 23.94 20.98 29.66 34.13 46.07 49.56 54.50 2.00 4.88 11.60 16.24 26.15 37.21 34.44 41.85 52.23 53.88 4.87 9.13 17.46 24.54 32.38 35.39 36.27 48.09 55.00 56.48 1.52 8.07 8.90 16.06 26.18 32.01 38.03 43.04 49.86 53.28 2.75 14.32 11.62 15.78 22.40 35.69 44.05 38.89 49.32 54.03 0.33 8.01 16.44 25.37 23.51 37.71 43.70 40.09 52.35 58.36 4.54 6.33 19.35 18.14 31.89 31.81 42.15 46.19 45.40 57.71 4.04 13.48 16.71 17.19 24.93 37.93 35.41 46.74 45.12 53.98 4.50 13.99 12.20 19.43 30.19 28.36 36.90 43.51 44.91 54.97 0.86 8.17 18.82 18.60 27.86 34.60 42.55 40.38 46.83 53.54 0.52 4.29 19.80 15.06 31.78 32.97 35.28 44.16 52.55 57.26 4.02 10.40 10.75 25.39 29.97 38.23 40.80 42.73 45.64 56.67 3.80 8.54 12.36 25.85 20.68 30.68 40.63 38.98 52.05 57.25 3.95 13.69 9.22 25.05 20.82 29.50 33.54 48.19 50.64 58.39 0.41 6.58 13.65 25.90 26.83 33.76 34.38 46.55 45.82 55.61 [download] The second produces strictly ordered -- I found I needed that sometimes -- but it extends the range of the adjustment down to the last picked value, thus allowing for skewed distributions whilst favouring more even ones. That is, for args ( 1, 4, 3 ), the first pick is 1.5 ą 0.5 as normal, but if that pick is (say) 1.1, then instead of the second pick being 2.5 ą 0.5 it becomes 2.5 + (-1.4 .. +0.5); and if that picked (say) 1.2, then the third pick will be 3.5 + (-2.3 .. +0.5 ). The probability of extreme skewness decreases exponentially. The input parameters here are (1,100,10), and the count is the number of tries before the highest value falls below 90% (80%,70%...): E:\Chart>rndChart -L=90 3.19 7.43 15.68 19.31 47.80 48.67 50.26 68.37 68.46 82.69 after 2 tr +ies E:\Chart>rndChart -L=80 7.50 16.23 18.26 31.39 50.49 57.02 57.21 66.03 69.93 72.89 after 3 tr +ies E:\Chart>rndChart -L=70 10.10 17.22 17.46 37.29 39.13 58.73 60.07 62.71 65.91 68.60 after 43 t +ries E:\Chart>rndChart -L=60 9.50 18.43 19.91 24.09 47.55 48.76 49.03 53.83 54.86 57.29 after 67 t +ries E:\Chart>rndChart -L=50 4.88 19.23 29.45 39.03 40.90 42.33 43.60 43.81 45.65 49.11 after 3253 + tries E:\Chart>rndChart -L=40 5.10 12.01 14.57 14.75 20.23 21.83 27.22 31.43 31.72 39.10 after 1000 +17 tries E:\Chart>rndChart -L=30 10.16 12.98 15.25 15.67 17.10 17.89 21.85 23.07 28.24 28.28 after 2980 +335 tries E:\Chart>rndChart -L=20 Terminating on signal SIGINT(2) [download] It currently only does negative skewness. (If anyone can see a way to also produce positive skews with a similarly exponential rareness, I'd love to hear of it?) The code>: `sub rndPart2{ ## strictly ascending, skew possible my( $start, $end, $count ) = @_; my $step = ( $end - $start ) / $count; my $last = $start; $start += $step / 2; map{ sprintf "%5.2f", $last = $last + ( rand( $_ + $step/2 - $last +) ); } map{ $start + $step*$_ } 0 .. $count-1; }` [download] Some samples: <Reveal this spoiler or all spoilers in this node or all in this thread> With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l] [select]
Re^2: Random partitions? (Thanks and solution.) by GrandFather (Saint) on May 04, 2015 at 04:22 UTC
Can you get your "positive skews" by tossing a coin at the start to determine the end you work from? Perl is the programming world's equivalent of English	[reply]
Re^3: Random partitions? (Thanks and solution.) by BrowserUk (Patriarch) on May 04, 2015 at 07:19 UTC
That'd work, but basically means duplicating the loop logic. (In part or in full.) Update: Cleaner second loop. But as I've already changed the routine to produce an array and return a ref to it -- 3 intermediates lists gets costly once count get over a few hundred -- tossing the coin afterward and reverseing and 'inverting' the values is a bit easier: sub rndPart3{ ## strictly ascending, skew possible my( $start, $end, $count ) = @_; my $step = ( $end - $start ) / $count; my $last = $start; $start += $step / 2; my @res; my $lim = $count - 1; for( my( $i, $p ) = ( 0, $start); $i < $lim; ++$i, $p += $step ) { push @res, sprintf "%.2f", $last = $last + ( rand( $p + $step/ +2 - $last ) ); } if( rand() < 0.5 ) { for( my( $s, $e ) = ( 0, $#_ ); $s <= $e; ++$s, --$e ) { my $temp = $end - $_[ $s ]; $_[ $s ] = $end - $_[ $e ]; $_[ $e ] = $temp; } } return \@res; } [download] Not very pleasing though!? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l]
Re: Random parttitions? by pme (Monsignor) on May 03, 2015 at 18:44 UTC
This is a bit different version: `sub rnd_part { my ($start, $end, $count) = @_; my $step = ($end - $start) / $count; $start = $step / 2.0; do { printf "%f ", $start + (rand($step) - $step / 2.0); $start += $step; } while ($start < $end); print "\n"; }` [download]	[reply] [d/l]
Re: Random partitions? by QM (Parson) on May 05, 2015 at 08:37 UTC
I was curious what your metric would be for a good distribution vs. bad? For instance, what is the mean and variance for the step size? Or if there's some density metric over some interval, such as count? Deciding what this should be might lead more directly to an acceptable algorithm. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]
Re^2: Random partitions? by BrowserUk (Patriarch) on May 05, 2015 at 10:35 UTC
For instance, what is the mean and variance for the step size? Or if there's some density metric over some interval, such as count? I needed to generate some (eventually large) test datasets for my graphing app; so all I really needed was something that was roughly a straight line between two points, but with enough variations to allow co-planer lines to be discernible. (Ie. not overlayed.) It's for checking that boundary, ranging, scaling & tick mark calculations etc. do sensible things with generic datasets. So far, I'm very happy with what came out of this thread (see:Re: Random partitions? (Thanks and solution.)). With the addition of another switch, `-P=2.718` the program raises the supplied number to the powers of each randomly generated value before output, thus I can test for the autoselecting of log scales. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l]