in reply to random #s

Why sort at all? Generate your random numbers already ordered:

print+( @orderedRands = grep{ rand(1) < 0.075 } map{ ($_) x 20 } 1 .. +20 )[0..19];; 1 1 1 1 2 2 3 3 4 5 5 5 6 6 7 8 9 9 10 11 print+( @orderedRands = grep{ rand(1) < 0.075 } map{ ($_) x 20 } 1 .. +20 )[0..19];; 2 2 2 3 4 5 5 5 9 9 10 12 13 13 14 14 16 17 18 18 print+( @orderedRands = grep{ rand(1) < 0.075 } map{ ($_) x 20 } 1 .. +20 )[0..19];; 1 2 4 4 4 4 4 4 6 6 6 8 8 8 10 11 11 12 12 12 print+( @orderedRands = grep{ rand(1) < 0.075 } map{ ($_) x 20 } 1 .. +20 )[0..19];; 1 1 3 3 3 4 4 5 6 6 7 8 8 9 10 10 11 12 14 15 print+( @orderedRands = grep{ rand(1) < 0.075 } map{ ($_) x 20 } 1 .. +20 )[0..19];; 1 1 2 2 2 3 3 3 4 5 5 5 6 7 7 8 10 11 12 13

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
div class=

Replies are listed 'Best First'.
Re^2: random #s
by davido (Cardinal) on Oct 07, 2016 at 17:30 UTC

    I think this exhibits a pretty strong bias toward the lower numbers generated, and against the higher numbers. It may be pseudo random but the distribution is not close to flat.

    use List::Util qw(sum); my @count = (0) x 20; for (1 .. 200000) { my @orderedRands = (grep{ rand(1) < 0.075 } map{ ($_) x 20 } 1 .. +20 )[0..19]; $count[$_]++ for grep {defined} @orderedRands; } my $sum = sum(@count); print "$_\t=>\t$count[$_]\t(", sprintf('%.2f', ($count[$_]/$sum)*100), + "% TTL)\n" for 1 .. 20;

    ...which produces...

    1 => 300463 (7.53% TTL) 2 => 300337 (7.52% TTL) 3 => 300667 (7.53% TTL) 4 => 300333 (7.52% TTL) 5 => 300725 (7.53% TTL) 6 => 299081 (7.49% TTL) 7 => 299696 (7.51% TTL) 8 => 297134 (7.44% TTL) 9 => 289983 (7.26% TTL) 10 => 276352 (6.92% TTL) 11 => 252348 (6.32% TTL) 12 => 217016 (5.44% TTL) 13 => 175867 (4.41% TTL) 14 => 133708 (3.35% TTL) 15 => 96010 (2.40% TTL) 16 => 65012 (1.63% TTL) 17 => 41130 (1.03% TTL) 18 => 24485 (0.61% TTL) 19 => 14228 (0.36% TTL) 20 => 7685 (0.19% TTL)

    ...and that is with adding the filtering-out of undefined values that found their way into @orderedRands.

    I hope my adaptation of your algorithm didn't miss an important nuance.


    Dave

      this exhibits a pretty strong bias toward the lower numbers generated, and against the higher numbers.

      Yes. A side-effect of my lazy way of attempting to ensure that at least 20 numbers are produced each time.

      With 400 inputs to choose from the selector value should be 0.05 not 0.075; but the nature of random is that whilst 0.05 produces a fair pick:

      [undef, 999508, 999959, 1000278, 1002083, 999969, 1001388, 1002007, 99 +9127, 1000314, 1001289, 1000014, 999255, 1000929, 1001682, 1000862, 9 +98954, 1002277, 999569, 1000337, 999569]

      It will on occasion produce as many as 44 values or as few as 3:

      [undef, undef, undef, 2, 7, 33, 157, 398, 1041, 2437, 5333, 9541, 1650 +8, 25675, 37569, 50824, 64506, 76623, 85656, 90711, 91374, 86560, 78584, 68077, 56808, 44695, 33849, 24855, 17439, 11775, 7601, +4708, 2839, 1690, 981, 568, 290, 137, 74, 41, 12, 9, 11, 1, 1], )

      By raising the selector value to 0.075, I made it far more likely that it would produce at least 20 values. The list slice ensures that it is not more than 20; but also introduces the bias by always throwing away the higher values when it overproduces.

      The following results from the above code with the only change 0.05 => 0.075; demonstrates that the distribution of the range is still very fair; but on average 50% more numbers are produced each time before the slice operation trims the numbers back, (and introduces the bias). It also shows that the probability of under-producing is greatly lessened:

      [undef, 1495533, 1499609, 1498974, 1499522, 1501930, 1501314, 1499981, + 1501222, 1499646, 1500600, 1500068, 1500915, 1498017, 1500384, 15010 +31, 1498257, 1500431, 1501058, 1498359, 1500716] [undef, undef, undef, undef, undef, undef, undef, undef, undef, 2, 12, + 31, 75, 170, 373, 768, 1568, 2718, 4802, 7795, 12096, 17806, 24879, +32813, 41477, 51438, 59539, 66763, 72668, 74986, 75575, 73039, 68515, 61653, 53917, 46112, 37751, 30042, 23138, +17536, 12842, 9196, 6284, 4298, 2803, 1748, 1053, 721, 429, 253, 156, + 80, 43, 17, 8, 8, 2, undef, 1, undef, 1],

      This could be fixed by repeating the process until exactly 20 numbers come out, which ensure the fairness:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 300; my( @counts, @ns ); for( 1 .. 1e6 ) { my @orderedRands = grep{ rand(1) < 0.05 } map{ ($_) x 20 } 1 .. 20 +; while( @orderedRands != 20 ) { @orderedRands = grep{ rand(1) < 0.05 } map{ ($_) x 20 } 1 .. 2 +0; } ++$ns[ @orderedRands ]; ++$counts[ $_ ] for @orderedRands; } pp \@counts, \@ns; __END__ C:\test>junk62 ( [undef, 1000652, 999987, 1000022, 999969, 999146, 1000961, 1000568, +1000129, 1000725, 999884, 999509, 999756, 1000538, 999763, 1000708, 1 +000826, 999799, 998778, 998714, 999566], [undef, undef, undef, undef, undef, undef, undef, undef, undef, unde +f, undef, undef, undef, undef, undef, undef, undef, undef, undef, und +ef, 1000000], )

      Of course that is far more expensive than doing the sort that it avoids.

      But then, my post was nothing more than a semi-humorous response to a question that itself is something of a joke.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The algorithm needs a minor tweak:

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1173445 use strict; use warnings; use List::Util qw(sum); my @count = (0) x 20; for (1 .. 2e4) { my ($top, $bottom) = (20, 20 * 20); my @orderedRands = grep{ $bottom > 0 && rand(1) < $top / $bottom ? (--$top, --$bottom, 1) : (--$bottom, 0) } map{ ($_) x 20 } 1 .. 20 ; $count[$_]++ for grep {defined} @orderedRands; } my $sum = sum(@count); print "$_\t=>\t$count[$_]\t(", sprintf('%.2f', ($count[$_]/$sum)*100), + "% TTL)\n" for 1 .. 20;

      ...which produces...

      1 => 20031 (5.01% TTL) 2 => 20037 (5.01% TTL) 3 => 19831 (4.96% TTL) 4 => 19953 (4.99% TTL) 5 => 19840 (4.96% TTL) 6 => 20068 (5.02% TTL) 7 => 19868 (4.97% TTL) 8 => 20030 (5.01% TTL) 9 => 19790 (4.95% TTL) 10 => 20076 (5.02% TTL) 11 => 19944 (4.99% TTL) 12 => 19909 (4.98% TTL) 13 => 20049 (5.01% TTL) 14 => 20150 (5.04% TTL) 15 => 19940 (4.98% TTL) 16 => 20112 (5.03% TTL) 17 => 19973 (4.99% TTL) 18 => 20157 (5.04% TTL) 19 => 20088 (5.02% TTL) 20 => 20154 (5.04% TTL)

      Which looks pretty good.