BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Given start, end & count, I want to generate an ordered set of count random values between start and end, that partition that range roughly equally. (Not exactly; then they wouldn't be random!)

So, I can do something like this:

sub rndPart{ my( $start, $end, $count ) = @_; sort{ $a <=> $b } map{ sprintf "%.2f", $start + rand( $end - $start ) } 1 .. $count };; print rndPart( 1, 10, 10 ) for 1 .. 10;; 1.26 1.33 2.13 3.69 5.22 5.30 6.16 7.02 9.69 9.98 2.51 3.97 5.80 6.10 6.13 7.30 8.33 8.70 9.87 9.99 3.59 4.94 6.21 6.88 7.23 7.53 8.30 8.36 9.84 9.91 2.97 3.38 4.01 4.81 5.04 5.42 5.99 6.42 8.23 9.06 1.10 2.86 3.12 4.09 4.35 4.38 7.29 7.46 9.02 9.48 1.01 2.70 3.15 3.23 4.47 6.56 7.62 8.84 9.16 9.52 1.08 1.63 3.44 4.22 5.08 5.58 6.34 7.40 7.69 9.40 3.29 3.66 4.44 4.54 4.74 4.81 5.02 5.53 6.53 8.09 1.44 2.34 2.51 4.77 5.74 6.64 8.59 8.80 8.98 9.28

Which isn't too bad, but:

  1. There are no guarantees about the distribution:

    They could all end up clustered at one end, or the other, or bunched up in the middle some where.

  2. The need to sort is a drain on resources if the sets are large.

    It also means that the values are strictly increasing; and whilst I want the general trend to be so; I'd like to allow for the occasional blip above or below.

Is there a better way? Can the sort be avoided? Can the distribution be regularised, with being either rigid; or precluding the occasional skewed set?

I've thought about generating a set of exact partitions, and then applying a randomised delta, but the implementation isn't falling off the page for me today.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re: Random parttitions?
by hdb (Monsignor) on May 03, 2015 at 12:49 UTC

    Keywords to look for are stratification and quasi-random numbers.

    A simple method would be to partition your interval into n smaller parts and then generate count/n random numbers in each subinterval. In the extreme, you would generate one random number in each of count subintervals.

Re: Random parttitions?
by RichardK (Parson) on May 03, 2015 at 13:16 UTC

    what about this? no sort and a somewhat adjustable distribution.

    sub parts { my ($start,$end,$count,$max) = @_; my $v = $start; for my $i (1..$count) { my $r; do { $r = $v + rand($max); } until $r < $end; $v = $r; printf "%.2f, ",$v; } print "\n"; } for (0..10) { parts(0,10,10,2.5); }
    0.90,1.01,1.84,2.11,3.56,3.74,4.57,5.26,6.47,7.00, + + 1.43,1.89,3.00,4.48,6.45,7.86,9.82,9.96,9.96,9.98, + + 0.90,1.58,1.86,2.47,3.34,5.31,5.56,5.75,6.50,8.25, + + 0.90,2.88,3.44,3.93,5.69,7.20,8.74,9.31,9.84,9.89,

    It can give you repeated values, particularly at the end. (... 10.0, 10.0) but that's more an issue of rounding than anything else.

    Of course if you set the max to large then it could take a long time to finish.

      As an extension of your idea, one could just do $count steps starting from zero, and then map the result to the desired interval.

      use strict; use warnings; sub parts { my ($start, $end, $count) = @_; my @steps = (0); push @steps, rand()+$steps[-1] for 1..$count; my $scale = ($end-$start)/$steps[-1]; $_ *= $scale, $_ += $start for @steps; return @steps; } my @parts = parts 0, 10, 10; print "@parts\n";

      This, of course, is not able to generate skewed distributions, so less flexibility.

Re: Random partitions? (Thanks and solution.)
by BrowserUk (Patriarch) on May 04, 2015 at 00:15 UTC

    Thanks to hdb, RichardK & pme.

    Based on ideas from your code I've derived two routines:

    1. The first follows the basic schema of picking $count evenly distributed points offset by $step/2 and adding rand( $step ) - $step/2;

      but except for the first and last, it extends the random adjustment to rand( $step * 2 ) - $step, which allows for occasional overlaps (out of order points):

      sub rndPart1{ ## can overlap my( $start, $end, $count ) = @_; my $step = ( $end - $start ) / $count; $start += $step / 2; map{ my $mid = $start + $step*$_; my $v = $mid + ( ( $_ == 0 || $_ == ( $count - 1 ) ) ? ( rand( $step ) - $step/2 ) : ( rand( $step * 2 ) - $step ) ); sprintf "%5.2f", $v; } 0 .. $count-1; }

      Some samples:

      0 .. 59 / 10 3.90 14.52 13.91 22.28 26.68 31.21 40.83 49.55 55.17 57.06 3.87 12.36 12.22 26.06 25.67 31.14 37.76 47.42 52.80 55.30 3.74 8.34 20.48 20.20 22.88 26.66 39.94 47.00 47.97 53.26 4.44 12.14 11.63 18.69 31.22 30.80 42.69 39.36 48.64 57.32 3.04 12.60 17.57 16.92 27.94 30.26 42.05 41.96 45.39 57.34 5.74 3.39 12.46 23.94 20.98 29.66 34.13 46.07 49.56 54.50 2.00 4.88 11.60 16.24 26.15 37.21 34.44 41.85 52.23 53.88 4.87 9.13 17.46 24.54 32.38 35.39 36.27 48.09 55.00 56.48 1.52 8.07 8.90 16.06 26.18 32.01 38.03 43.04 49.86 53.28 2.75 14.32 11.62 15.78 22.40 35.69 44.05 38.89 49.32 54.03 0.33 8.01 16.44 25.37 23.51 37.71 43.70 40.09 52.35 58.36 4.54 6.33 19.35 18.14 31.89 31.81 42.15 46.19 45.40 57.71 4.04 13.48 16.71 17.19 24.93 37.93 35.41 46.74 45.12 53.98 4.50 13.99 12.20 19.43 30.19 28.36 36.90 43.51 44.91 54.97 0.86 8.17 18.82 18.60 27.86 34.60 42.55 40.38 46.83 53.54 0.52 4.29 19.80 15.06 31.78 32.97 35.28 44.16 52.55 57.26 4.02 10.40 10.75 25.39 29.97 38.23 40.80 42.73 45.64 56.67 3.80 8.54 12.36 25.85 20.68 30.68 40.63 38.98 52.05 57.25 3.95 13.69 9.22 25.05 20.82 29.50 33.54 48.19 50.64 58.39 0.41 6.58 13.65 25.90 26.83 33.76 34.38 46.55 45.82 55.61
    2. The second produces strictly ordered -- I found I needed that sometimes -- but it extends the range of the adjustment down to the last picked value, thus allowing for skewed distributions whilst favouring more even ones.

      That is, for args ( 1, 4, 3 ), the first pick is 1.5 ± 0.5 as normal, but if that pick is (say) 1.1, then instead of the second pick being 2.5 ± 0.5 it becomes 2.5 + (-1.4 .. +0.5); and if that picked (say) 1.2, then the third pick will be 3.5 + (-2.3 .. +0.5 ).

      The probability of extreme skewness decreases exponentially. The input parameters here are (1,100,10), and the count is the number of tries before the highest value falls below 90% (80%,70%...):

      E:\Chart>rndChart -L=90 3.19 7.43 15.68 19.31 47.80 48.67 50.26 68.37 68.46 82.69 after 2 tr +ies E:\Chart>rndChart -L=80 7.50 16.23 18.26 31.39 50.49 57.02 57.21 66.03 69.93 72.89 after 3 tr +ies E:\Chart>rndChart -L=70 10.10 17.22 17.46 37.29 39.13 58.73 60.07 62.71 65.91 68.60 after 43 t +ries E:\Chart>rndChart -L=60 9.50 18.43 19.91 24.09 47.55 48.76 49.03 53.83 54.86 57.29 after 67 t +ries E:\Chart>rndChart -L=50 4.88 19.23 29.45 39.03 40.90 42.33 43.60 43.81 45.65 49.11 after 3253 + tries E:\Chart>rndChart -L=40 5.10 12.01 14.57 14.75 20.23 21.83 27.22 31.43 31.72 39.10 after 1000 +17 tries E:\Chart>rndChart -L=30 10.16 12.98 15.25 15.67 17.10 17.89 21.85 23.07 28.24 28.28 after 2980 +335 tries E:\Chart>rndChart -L=20 Terminating on signal SIGINT(2)

      It currently only does negative skewness. (If anyone can see a way to also produce positive skews with a similarly exponential rareness, I'd love to hear of it?)

      The code>:

      sub rndPart2{ ## strictly ascending, skew possible my( $start, $end, $count ) = @_; my $step = ( $end - $start ) / $count; my $last = $start; $start += $step / 2; map{ sprintf "%5.2f", $last = $last + ( rand( $_ + $step/2 - $last +) ); } map{ $start + $step*$_ } 0 .. $count-1; }

      Some samples:


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      Can you get your "positive skews" by tossing a coin at the start to determine the end you work from?

      Perl is the programming world's equivalent of English

        That'd work, but basically means duplicating the loop logic. (In part or in full.)

        Update: Cleaner second loop.

        But as I've already changed the routine to produce an array and return a ref to it -- 3 intermediates lists gets costly once count get over a few hundred -- tossing the coin afterward and reverseing and 'inverting' the values is a bit easier:

        sub rndPart3{ ## strictly ascending, skew possible my( $start, $end, $count ) = @_; my $step = ( $end - $start ) / $count; my $last = $start; $start += $step / 2; my @res; my $lim = $count - 1; for( my( $i, $p ) = ( 0, $start); $i < $lim; ++$i, $p += $step ) { push @res, sprintf "%.2f", $last = $last + ( rand( $p + $step/ +2 - $last ) ); } if( rand() < 0.5 ) { for( my( $s, $e ) = ( 0, $#_ ); $s <= $e; ++$s, --$e ) { my $temp = $end - $_[ $s ]; $_[ $s ] = $end - $_[ $e ]; $_[ $e ] = $temp; } } return \@res; }

        Not very pleasing though!?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: Random parttitions?
by pme (Monsignor) on May 03, 2015 at 18:44 UTC
    This is a bit different version:
    sub rnd_part { my ($start, $end, $count) = @_; my $step = ($end - $start) / $count; $start = $step / 2.0; do { printf "%f ", $start + (rand($step) - $step / 2.0); $start += $step; } while ($start < $end); print "\n"; }
Re: Random partitions?
by QM (Parson) on May 05, 2015 at 08:37 UTC
    I was curious what your metric would be for a good distribution vs. bad?

    For instance, what is the mean and variance for the step size? Or if there's some density metric over some interval, such as count?

    Deciding what this should be might lead more directly to an acceptable algorithm.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      For instance, what is the mean and variance for the step size? Or if there's some density metric over some interval, such as count?

      I needed to generate some (eventually large) test datasets for my graphing app; so all I really needed was something that was roughly a straight line between two points, but with enough variations to allow co-planer lines to be discernible. (Ie. not overlayed.)

      It's for checking that boundary, ranging, scaling & tick mark calculations etc. do sensible things with generic datasets.

      So far, I'm very happy with what came out of this thread (see:Re: Random partitions? (Thanks and solution.)). With the addition of another switch, -P=2.718 the program raises the supplied number to the powers of each randomly generated value before output, thus I can test for the autoselecting of log scales.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked