BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I'm testing an algorithm involving 2D coordinate pairs. I've been generating random datasets like this:

my @points = map[ int( rand 500 ), int( rand 500 ) ], 1 .. $N;

Which is okay as far as it goes, but being pseudo-random, tends to pretty equally distribute the points throughout the 2D plane.

But the worse case scenarios for the algorithm are when you get a bunch of points grouped in one place and then a few outliers far away. Either in another bunch, or widespread.

I could generate a point; generate a bunch around that point (using a subrange of the coordinate space, and then fully randomly add a few more, b u u t... that doesn't seem kosher.

So, any thoughts on a 'better way'?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Randomly biased, random numbers.
by roboticus (Chancellor) on Dec 05, 2013 at 23:11 UTC

    BrowserUk:

    I used to use a "density map" generator back in the day. Basically each time you want a random coordinate, you generate the coordinate plus one extra random number. You then check the number against a density function, and if the extra random number is less than the density at that location, you return the coordinates. Otherwise you try again:

    sub generate_coordinate { my $fDensity = shift; my @coordinate; { @coordinate = (rand, rand); redo if $fDensity->(rand, @coordinate); } return @coordinate; } sub density_uniform { 1 } sub density_h_gradient { $_[0] < $_[2] } sub density_v_gradient { $_[0] < $_[1] } sub density_broken [ 0 } sub density_lookup { my ($chk, $r, $c) = @_; return $lookup_table[$r][$c]<$chk; }

    So by defining an appropriate density map function, you can create many types of distributions. The disadvantage is that your density map function may reject too many candidates, slowing things down. (The density_broken function, for example, is quite useless in this regard.)

    The reason that I thought you might like it is this: For testing a project, I would create distinct distributions by simply drawing an image with MSPAINT or similar and loading that into a lookup table and using a function like density_lookup.

    I don't know your objections to your own suggestion, so I don't know if this one is interesting or not, though.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I don't know your objections to your own suggestion,

      Basically, it is too directed. That is, it requires parameters to be chosen -- the ratio between clustered picks and non-clustered; the size of clustering subrange; etc. -- which means I would essentially be choosing what to test and thus excluding anything I haven't thought of.

      so I don't know if this one is interesting or not

      This is very interesting. I particularly like the idea of using images -- whether hand-drawn, or grabbed at random from an image search -- to bias the picking process. It has so many possibilities ...

      Eg. grab a random image, process the image with a filter to reduce it to a just points of a particular color or hue; or maybe use a Conway's Life type process to manipulate the pixels until groups of similar hues the reduce to single points; or a dozen other ideas; and then use those points as my dataset.

      The only problem with the idea is that it has triggered so many possibilities for investigation, I might never get back to testing the algorithm :) Thanks for kicking down the doors on the box of my thought train!


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      Math::Vector::Real::MultiNormalMixture generates density functions which may suit the OP case and that can also be randomly parametrized.

      Anyway, one problem with this approach is that the ratio of a discarded points could be too high.

      In that case, a more efficient way may be to divide the plane in regions (i.e. triangles), calculate the probability of every region and then generate the random points first picking a region and then a point inside the region with your proposed algorithm using the conditioned density function.

        salva:

        Yes, the discard rate can be a problem. For the table lookup version in 2D, I had a speedup that worked decently: Generate the first coordinate, then use the same technique to generate the remaining coordinate. It can still be a problem, though, in the event you choose a "mostly black" line, but I never needed made anything better than that.

        I also tried to make a "transformational" technique that wouldn't reject any coordinates, but never got it working well enough to use. (Rather: the technique worked fine, but coming up with the warping functions for the project was more difficult (for me, at any rate), so I simply tended to run the density-mapping version before going to lunch, going home for the day, etc.

        The intention was: Generate a random coordinate, and "remap" it based on a displacement function. I was hoping to be able to turn a density function into a space warping function. The difficulties I had were primarily coming up with functions to warp space appropriately, and ensuring that I could hit any point in the desired output space without too much overhang. (If the function moved the point outside the desired range, you had to reject the point and try again anyway.)

        Update: Made edit above.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Randomly biased, random numbers.
by RichardK (Parson) on Dec 05, 2013 at 23:05 UTC

    I think it might be worth trying polar coordinates, so that each new point is a random distance and direction form the last. Then you can tweak your distance distributions to suit your purpose. If you need to restrict the total area, you could wrap the 2d space or just ignore any points that fall outside.

    Math::Trig has handy polar conversion functions, so it would be easy to give it a try.

      Nice thought. I tried this and several variations on it:

      my $start = [ int( rand 500 ), int( rand 500 ) ]; my @points = map{ my $a = rand( 2*PI ); my $d = int( rand 500 ); my $x = $d * cos( $a ); my $y = $d * sin( $a ); $x -= 500 while $x > 500; $x += 500 while $x < 0; $y -= 500 while $y > 500; $y += 500 while $y < 0; [ $x, $y ]; } 1 .. $N;

      Whilst it certainly tends to create clusters, the clusters tend to be (very) evenly distributed, which doesn't trigger the worse case.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        How about using a inverse log distance distribution? Maybe something like this :-

        my $d = max_distance * ( 1 - log( rand(1000) ) / log( 1000 ) );
Re: Randomly biased, random numbers.
by kcott (Archbishop) on Dec 06, 2013 at 10:09 UTC

    G'day BrowserUk,

    "But the worse case scenarios for the algorithm are when you get a bunch of points grouped in one place and then a few outliers far away. Either in another bunch, or widespread."

    Perhaps the following might be suitable. The "bunch of points grouped in one place" tend towards the centre of the plane; the outliers are typically widespread rather than forming outlying clumps.

    #!/usr/bin/env perl use strict; use warnings; my $rand_bias; # Boolean (for testing) my ($x_high, $y_high) = (80, 80); my $N = 10_000; for my $rand_bias_bool (0, 1) { $rand_bias = $rand_bias_bool; print '*** ', $rand_bias ? 'WITH' : 'NO', ' RANDOM BIAS', " ***\n" +; my @points = map [ gen_rand($x_high), gen_rand($y_high) ], 1 .. $N +; print_matrix(\@points); } sub gen_rand { my $rand_high = shift; return int rand $rand_high unless $rand_bias; # For testing only! my $high_part = $rand_high; my $rand_sum = 0; while ($high_part) { my $rand_arg = 1 + int rand $high_part; $high_part -= $rand_arg; $rand_sum += int rand $rand_arg; } return $rand_sum; } sub print_matrix { my $coords = shift; my %matrix; for (@$coords) { my ($x, $y) = @$_; ++$matrix{$x}{$y}; } for my $y (reverse 0 .. $y_high - 1) { for my $x (0 .. $x_high - 1) { if (exists $matrix{$x}{$y}) { $matrix{$x}{$y} = '@' if $matrix{$x}{$y} > 9; } else { $matrix{$x}{$y} = ' '; } print $matrix{$x}{$y}; } print "\n"; } }

    Here's a sample run showing an (ASCII) graphical representation of the generated data. Note that I've only used an 80x80 grid (rather than your posted 500x500) to get a reasonable display.

    *** NO RANDOM BIAS *** 2 1123 1213 41542213214122 21221 42 2343422 1 333 212111 121314112 +1211 12321 1 1211123132 231223134233 1 111211313 2432412211 2 2 224 1 14 31 2 +11121 31 3 2 12143342121 211 22 11 623132 22211 112121111112 3133 24311332111 +1221122223 11 3 314331 2 4134131 1111 1121 212214 1 4124111 231 2 1131121131 +2 12412111 251 321311 112145214 1111 32112 41441313 1321113213 3 3 213312114 22 +22 13 32 4 12 231 123 15132 3 1 11114 2222 2 11421132 1 22 124 4112213233131 +3 11122 1 122 2312112 12122 2143111123132 122 2123 1233 2 1 322 1121421 13421 + 211217123 51221133 11221222 22 1 4132111221112332 3141122 1211222441221 3132223 +211 1232 2 2231 11123111221 13 1131611234134 13131316 2111133 1 41 1111 2422 3 +321 211122 322 2 12111131 11 2231111152341121113 13 3223332 1 321 241121 132 4 +23313226 2 3132111 2 2 231232 1241624132115221 23322242 13412321112 11 1 1322 +22433 2 3 123133123 11213 21111 3311 122342 1 2 21211 121 115211111 22 3321 12 +11 32 3211 22 11 4154123 11111 314311 222122 2512 22 1243 1111 322 112442 12 25 +1321 22 1 331 21 231 113121122 131 112 3 2441312 214122212 21113 41221232522 31 +3 2113 2 222213 1113442 334 12132 25322 141 22 113111 2 321 13321 56223 323 +3312432 11 12 2 221141122 3 2 2333 2 3 2253131132 3122113112131 23 12 24 222 +52121523 2 3 25 121112 3 1 32 4131232233 123 2 123 1121 1224 24152111131 2 1 + 211 3 2121 2211 33211132312 3131 21 2211313312212331222 231 2131 211 +2231111221 2212221 2231 2112211 124 1332 12 1 114411121 2 21115321121 35 323 +22 1422 1 1 22 521 233131231211313111 1231223222 3 2 2332 1224 1211 1 141 132 +11211331 12 21312411 121 123331 132 2 3121232233431412 22 132123234 21 + 11 113223 2 334 3 1 12111 1132112 12 1422 1 21 41 2213 11311 143 3211122 31 114 +2 2 1 22 1 2212313141 121 11424221113 2212111123 3242 22 321 31 1424 221 11711 +1214 221 1 322211312 2212312 121414122131132131132321311121213111 2 114311211 1 5 + 2 2 1 2 2 213 23 2 2212 2 3133 2131 5123223 23 21 532 22 1 13413242133311 +22113 2342 322232131 13 23311 3 31 3334 121 1214112 1 3 21312 13 123 222111 11 +31411 1211 3124141222111 13 2112 31 22242 121231131 224211 32342133222153223 +1324132313 11613 2111 1113124322 3 2 3121112 112314 322123211 221311 1 2111 +3521231 42 113111 332 112 134121121 21 143422 31 512212 5 33 24221142 4333 222 +1111112113 3 23 51 11 221 11212 221 521111113251424 315 212131112221 1 113322 +14521 1322 41652 2 22121213211532132111 2413112112312 311332 13 31 5111 3224 1121 +4 3332221 1 3211 2111141 211211 1 1212 1 1 111 1 23 151641 3 231 3 5 122 + 11 22112 2123 11231 1142114321 22242321132 1 14221 2311 211121311 31 212 1231 1 + 2 452111 2 111 1412111312 2 3122 21231 6 33 332211 12112311231 21121214 112113 +2 12132131 1321 41212441113 3 141 4326412 1312 112231 231121122222 3 3 1121 321 +12112 1 11 1 131 1332 12 1 14 22241133 41241 2222 422 33 223 333322 1 1823 + 11 22171 11412 1 1221 4 2 2211354231 1321211222214 22223 122 2211312421 2 11 +2 12 2111 2 23162321 121211 3 2243213413 32 1 31 11 4 2 4121 32423 1 8 51 2 +12413212 212 1 1124 1 211121 322132 132423 1122221 125211 1 1211 151222 11 +323 131211 12111 4124 2 24231314211 2 14 212143 112142 1322 2212 3 142341331 + 23341122 225221112111321321322 1 46123123413131 113212421224233 33 2 22 11341 +3121233321 424 231113 31212122423123 34311 112 322 5131122 32 32 1 123 13 211 +151121323 321 121412121 3 2321312141 1 311 2121113113 2 212112 13 1111 3 111 +42222 2 23 51 14223311113 212 43431112 2221 11 24 2122 21 1113321211112222 11 +152213 122 32121 2 23 323223132 21132 22221231 211 323213 22221321 333 2 32 +11 411 3 1 545 111111 122323324 3 33232 122 1121 3 2 2 3 31312112223 21233 42311 +21 14 42 2 2321 2111 3 1111111 342 31 14 34112 11122 214 121 211341151431 +21322231 2 111322131 32 33 1512 122313412 311332215 11131 2 221 22122324 1 233 +142 122 1 111 42 11 21 1222 12111 211 3 322 3 13 11 12122212 12 133 2 2 421 + 22212 12 1312 311341311 1332 53224121111344321 62121122221 214 21122 331 32 + 3 2 21151 11 113213 12141 12112 1212121214 1 3 1 312 332312221332 22212212 1 11 +1211 1 42 12132134 3 4242 2 212 22232411161153222 2321112 214132 4 121 22 2222 1 +1231123521 21112 33 1122322 21342121122 12 111 32111311 121233122212 1 2 12 11 21 +111 41531 2132 211 145121 2 2 12 21 22221 11 2212112121 11 12 61232121 41121 +41331221 2 32 2 3 1 2 332413111 21 231 21 1142221 322331 23 32 2 2134 2224 131 1 +3511211121 41 32112212 32111112 2 1331212243213121 11 33121212 112 2321212123 + 13213 124 241 3232 321 34 311 2 1 1222222442 122131 1 311 221141421 1123 243 1 +42111 1 2 4121321222 211231 23111 14141122 1113 14211341225 3 1122233 12232 15 +2113321122 421122 14311211511 3111131232111252 223132 2112 24411325 1222222 132 +3 1 1 221 23223335111 14 31233231211 11122221112122 1113221 3123 1 1121125 2 +1221 24231 122 24433232211 313431 133 1222133 1543 22132 32 252211 311212121322 +122632 243 212241334144 2 24411312311312 2111211 2 511 213 22231 125 133 132 +1213 32 22 3 4 4321212 2132 2 23 232131 11323 11 214 21321211 31 12 211 3211 +21221 121 2 3111213121142 121 1221311331111135111112123131311121 2 121 21211112 +113 3 624 7 21 11 11322145 124 1131112211 1 3371 321 1214121 412122 21 3121 1 +1213 32 1 41122122 25111 123122 2 11131 123 11223124 31 2 1 511322 11311 +133 111411 523 1211 2211123223231 1 2 11 2 112 21222 2111231 225351221 2111331211 + 4113 113 241 4 432 4 1222113 33412211 21 2111 21322111131211 114 411 12 14 1 +22212 221 2 11412122 3 12 152 112113322 11421433112 22 1 241311212121 11 22 +11233 32 5 21321 41111122 12242213 111111 12231 41222211 2 12222 24 31211 12 112 +2232 14 31 221122221 1 523121 24 1243 1 313332122232122232 11313 222 1 21222 123 +2322114122 3 211111222221213 1121 112123322 31413 1121123211 21 411212332 4412 1 +3 3311121 3 11421222 11 11523222 4233321 123211 1 13211311 11212 1113 2124 +133412321 2232122222121 2 11142342 21123 12312 54223421312112 2 1 1 611 12 11 +421121 161 6 113 113112 2 2 2123 121 211 3 111131214422123311112 11 4342 332 +1 22 221 3 3 6121 12 1532211111123212 213 3111 32224 141 22 112142122233121 +4221121312 14 41221 121224142 211 41 12 123 144 4134421 31241221133 44 1 +122342122 221 1 211311 311213 1 211 12 21 213 1 21 2 2 21 3243212 21 221 24 + 2 433424 21 1224324 12 1152311 21322531143 33 2 1 2242 33211 2111232 4113 +2 2221 4 3 1242 32262 1323 113 3231 23 1 2132321212113221 1 32 32 3 61 13 +12 1 31112 *** WITH RANDOM BIAS *** 1 1 + 1 1 1 + 1 1 1 1 2 + 1 2 1 2 1 1 1 + 1 1 1 12 1 21 11 1 2 1 1 + 1 1 11 1 1 2 111 1 2 1 1 12 1 1 +1 1 11 1 1 1 12 21 12 2 2 21 12 + 1 1 1 121 121 2 1 1 1 22 2 2 2 11 1 + 1 11 1 111 111212 12 3 2 2 12 2 1 1 221 1 2 + 1 1 311 21 1 1 11 1 1 111 11121 12111 1 1 1 11 1 + 1 1 12 1 2 21123 124 2 21 21 231111222 21 11 111 1 2 +1 11 1 1 1 111131 1 212 111311111211123112231 3 1111 1131 122 21 1 + 11 1 1 211 11 1 222321121 1 1 31 111231 1 11 241 1333 1112 + 1 2 2 21 1 213111 225312233 33 12 132111141 211 1 11 2 +1 1 13 1122 123 212 2222 33512451 112222122121 2 1 2 + 1 1 1 11222 1 151211 11123 1 231 121212333333 1 2 2 1111321 + 2 1 1 111 4312 223 111 421 231121243211 146352212312212 211 11 +2 1 2 1 2 12211151 34441342213 213 2212 22 121 2122 12 1 1 + 112 11213 111 422124551223115 3145 4142313 3 14 111 2 2 122 + 111 1 1 1 2 43111381222 13 6212141212214544311112 533 1112 11 + 1 1 1 22 12111 42121113 2221234113462244333122122 41 3134 2313 3 11 + 11 2 12 1 41123413741343 451 5133 5212422 12433321 412 21 11 11 +1 1 1 11 1 1 232124 21 233321531326 2464247494432361 323111411231121131 + 1 2 12 3 21 1 222334451 3226553432154 13 1433234323341 111 1 +1 1 1 1 22 1311122131 425232412 6557541136333442442164112231212211 1 +1111111 1 111 1121 2211313212262442643121524436252663261213544431132 2113 2 +1211 112124 3313 21 231134331522622554346425 374336355461111151221212 3111 +3 1 1 12122 11211 1 45 11 16 63423523366162645 873619 34531113131 22211 + 113 1 1 11 11111113 1225133141516566633455141263155232522 32424 3211 22311 + 11 1 22 111141 421632435724134663435266751331463645563131314 4 11 1 1 + 1 1 1 15 1131211331136548 54615553335115172334641433 65 452122 4142 1 2 +13 12 12 12 111 53533243334 741@@5375232315327441 1443233412121143 2 + 1 112 314 223275437215546152155428443311145346546158222232 22 12 +412 1 1 21111 2 1 212211226623145343637343@54626512374411141431 3311313521 +21 1 1 1 124 1111 244232426513843672232653655243961864333652222 11 223 + 123 1 2 11233312 22142323133323752633418244953529617228944646541 143 13221 + 1 11 212 222112113131 4415326 7522345544645239424132213112413 21114111111 +2 3 11 1 331 2152323731322353336616334449749541236535333377416 66234 3 1 + 1 1 112 11 3252236442553 325394334344885931563129546 422 5211 21 21 +1 1 1 121123 21232227441462473265855474525722132324214412566416313 2521 + 1 1 1 2 11 111 32 1233342555632415366515433892739453117846273 1242114131111 +1 2111 1 3 222524 142 46221541463957263194237@696334326564325414222334 211 + 1 1 1 1 11 121423131225371474677452396325412754364254139221134 1144 213 +2 121 311 22 32544146348655762243438@6542623 3444635554535123432 42 4 + 111 1 1 12 11 1222331253327124425252336658334675555621321435211222312213 + 1 2 2 1 13112338143212531358555688447745354367232113333513163111 21 +151 1 1 111 1 4323345325512254522346477246224522923523334235 15 431111 12 +1 1 1 11124221 4 1524 43733 3411@2237452337344233 45128423512121 13 11 + 2 1 2 2 1 1 14236133362312214435244542286541875 53322832112123151 121 2 + 11 1 1 2513 3 11 33236432661465424466731558454323433655423324431 511 21 +11 11 112 12312341532246 624762124353244425536336664842 41 13155341231133 1 +21 1 1 22 1114121322 1323139641633563647154542336353536315331324112513312 + 31 1 1 1 1111 322 11446 5238354232442522445457464145324525353142432112 1 +2 11 24 33113 5314 52 338453661232423166575541133 4213124 44311211 22 +1311 11 11 132141743136222162131372446246754534523524322 31356123 31111 +1 11 2 11111 1 1114322 411155223231574431436855534512333212 64 11 1314 31 + 1 1 313121123223313514 4134 46 312551234475 6124232524121113 12111 + 1 11 1 2 1 2 13222412256221 1543352225556349222283221433534 221331 12 1 +2 221 1 1 1 11 1 152462313326321412346424252214647436112362132222 1 3 1 +1 21 1 12 111 121113121561365333133543313 135511 2434442132132 112124 2 +1 11 222 213 1213343433413154111311235162 16225 23424 6214315 3 2 2111 +1 21 1 4 1223 12245431 21434144273533 22 1331 32 4131 3531221 2 +22 1 11 1 23 1211132221522341426712 351312215122 334112212 1 +2111 1 1 1 1 1124122121114 144312541535343 333 323 2 242122 21431 21 +1111 12 11 11332112132423231243633351 2253223113213 451 33 41231 21 11 + 2 11 21 122 21 31133132142321341153422 25122 23132 21 2222331131 22 + 1 11 2221 323 2 14322511142236324212122 32111 3211111 111 1121 1 + 112 2 1 2 1111 11 3115313 5311222241121 3332324 13322 12 331 11124 + 1 1 421132 233 3122 421132112164 2212213112111124112 11 + 1 11 12 1112 3111 41113122121133 42123 1 2 21 11 11 11 12 1 11 + 1 1 11 2 2 1 11 1332312132 211 11 1312 2 1121 121 11 + 1 1 1 311 2221221313 212 113123 1112121 111 1 21 1 + 1 1 3 1 1 2 21 1114131131111 11 1 11 12115 2 121 2211 1 2 + 1 2 2 1 22 2 1 31221 12 1 213211122212 111113 1 1 211 + 1 1 21112 2211 2 111 1 3 1321122111111 323 11 1 + 1 1 2 11 1 1 11 111112 1 1 1 11 1 1 1 1 11 + 1 1 1 11211211 121 121 2 1 1 1 11 1 1 1 + 1 1 2 1 1112213 1 1 1 1 1121 11 1 + 1 1 1 1 21 1 +1 1 1 1 11 1 1

    Update: I removed two lines with $loops which I'd used for testing but aren't part of the solution.

    -- Ken

      That does induce clumping, but it is always just one clump, always in the middle, and very uniform from one run to the next, which doesn't make for good test data.

      I want multiple clumps -- 1, 2, 3, or 4 -- and unevenly distributed -- just 2 big clumps in one corner; or 4 small clumps all down one side or ... -- and different on each run.

      Everything mathematical I've tried, tends to produce uniformity, which is kind of it's thing.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Randomly biased, random numbers.
by educated_foo (Vicar) on Dec 06, 2013 at 01:55 UTC
    Interesting question. You might want to be more explicit about exactly what characteristics you want in this distribution, i.e. what phenomenon generates your data. For example, I'm not a physicist, but for beams of particles with varying intensities hitting a detector plate subject to random noise:
    • Cluster means distributed uniformly over a rectangular area.
    • Number of points per cluster following an exponential distribution (or whatever the beam intensities follow).
    • Points in each cluster distributed as a 2D Gaussian. (With variance fixed or chosen according to some distribution?)
    • Number of clusters either fixed or chosen according to some distribution.
      You might want to be more explicit about exactly what characteristics you want in this distribution,

      That's hard. Mostly because I definitely do not want any formally defined distribution. Almost exactly the opposite in fact.

      The problem with "random" is that on average, the points will be evenly distributed over the range (2D plane in this case). That's almost the exact definition of PRNG.

      Whilst with enough iterations, all possible distributions -- including those where a majority of the points in the sample size tend to be grouped or clustered on one side or in one corner of the plane -- with even a relatively small plane, (500,500) and sample size 100 -- there are so many 'roughly even' distributions and so few 'lopsided' distributions, that I'd need to run billions of sets to ensure I'd tested a few of the lopsided ones. That's not practical.

      So, I'm looking for a way to induce lopsided -- which pretty much means not 'roughly evenly distributed' -- distributions, without prescribing where or why the concentrated and sparse bits will be.

      I can't think of any better description than: I want lopsided distributions that are completely randomly generated.

      Not good I know, but its the best I've got.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I definitely do not want any formally defined distribution.
        Do you have sample data you can perturb, mix, or otherwise use to generate test data?
        I want lopsided distributions that are completely randomly generated.
        Hm... Maybe transform your PRNG through a random, monotonic, nonlinear mapping? e.g. generate a piecewise-linear function (or spline) in each dimension with steps taken from the PRNG, then generate uniform random points and apply the function to them. I suspect a Real Statistician would scoff, but I am not such a person.
        That's hard. Mostly because I definitely do not want any formally defined distribution

        Download random pictures from the Internet and use them as the base to generate density functions.

        You may apply some simple transformations (for instance, dynamic range decompression) to obtain more disparate distributions.

Re: Randomly biased, random numbers.
by Corion (Patriarch) on Dec 06, 2013 at 08:10 UTC

    I wanted an "aesthetic" distribution of points over a plane, so I wrote Random::PoissonDisc. This module tries to distribute points "equally" across a plane with a minimum distance between each other.

    At the release time of the module, I wrote a blog post on Random::PoissonDisc which has graphics showing the distribution differences of "white noise" random and the "blue noise" random that Random::PoissonDisc produces.

    If the distribution properties of Random::PoissonDisc match your requirements, you will likely want to reimplement the module using PDL if you want better performance. The paper which outlines the algorithm is also linked from the module documentation.

      Actually, that seems to be almost the antithesis of what I'm after. It seeks to restrain the randomness to a regular(ish) grid pattern. My problem is that randomly picking points in the plane is too uniform.

      I want clumps, but I want them at a larger scale than they form naturally from purely random distribution.

      However, the article you linked, linked an article that linked an article on (something possible or possibly not) called "Perlin Noise" which, whatever it is called, looks like it could be just what I need ... Thanks for the lead.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Randomly biased, random numbers.
by Laurent_R (Canon) on Dec 05, 2013 at 23:18 UTC
    Hmm, I had a similar problem a couple of years ago, trying to test worst case scenarios, although in a totally different context. I don't claim at all to be an expert on this kind of things, but my feeling is that if you want to test worst case scenarios, then you have to bite the bullet and accept that your input data is not going to be random, and there is nothing wrong with that since you are specifically looking for not random situations. Then, of course, the problem is that you don't necessarily know for sure what is the true worst case scenario, sometimes it is actually very difficult to find out. For example, Shell sort and comb sort are known to be fairly efficient sorting algorithms, but they are very rarely used because nobody seems to really know what their worst case scenario could be and it is therefore difficult to assess their worst case complexity.
Re: Randomly biased, random numbers.
by QM (Parson) on Dec 06, 2013 at 09:46 UTC
    Sounds like you want to a number of distributions, from "one big clump" to "almost uniform", at random.

    What about generating several "poles" as clump attractors, each with a random weight for attractiveness. Then let each point "roll" down hill according to the attractor weights and distances. So generate 3D coordinates, and use the extra value as the weight. Something like this:

    for my $trial (1..$max_trials) { my @attractors; for my $attractors (1..int(rand($max_attractor))+1) { my @attractor = get_random_tuple(3,\@attractor_limits); push @attractors, \@attractor; } my @grid_points; for my $points ($min_points..$max_points) { my @point = get_random_tuple(2,@grid_point_limits); # $alpha limits the distance moved downhill, and could be rand +om @point = roll_down_hill(\@point, \@attractors, $alpha); push @grid_points, \@point; } my $result = analyze_data(\@grid_points); } sub get_random_tuple { my $size_of_tuple = shift; my $limits_ref = shift; my @tuple; for my $limit (@$limits_ref) { push @tuple, rand($limit); } return @tuple; } sub roll_down_hill { # left as an exercise for OMAR! }

    You can also play with variations of roll_down_hill to try different distributions, going from sub-linear to exponential (or maybe some of each), trying a host of different functions for transformations. And don't forget you can play with the distributions wherever you take a random number, including the number of attractors and their weights. For instance, you might find that fewer attractors are more likely to cause worst case behavior, and bias the number of attractors in the small direction.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Randomly biased, random numbers.
by salva (Canon) on Dec 06, 2013 at 09:50 UTC

      Yes. I remember playing with them for days.

      Whilst it does eventually produce really nice clustering, it does take an inordinate amount of time. And then there is the problem of deciding when is enough.

      I'd prefer to find something a little more direct. (BTW:Love the vids.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Randomly biased, random numbers.
by SuicideJunkie (Vicar) on Dec 06, 2013 at 19:05 UTC

    After skimming through the thread, my idea is:

    1. Generate a density gradient image with the clouds render in gimp/photoshop.
    2. Apply distortions for flavour.
    3. Save above as a macro so you can generate lots of distributions with one button press.
    4. Save as greyscale BMP
    5. For each pixel, convert it to a sample point if brightness > rand(1000)

    Note: I presume you don't need an exact number of samples, but if you do, just shuffle the list and then pick the first N.

    EG:
    http://i.imgur.com/OvPWjWf.png - GIMPed Difference Clouds + more contrast + gamma correction for darker darks
    http://i.imgur.com/Di5wzG4.png - Resulting random samples. (about 3000)

Re: Randomly biased, random numbers.
by GrandFather (Saint) on Dec 08, 2013 at 10:04 UTC

    A little late to the party, but maybe interesting. The following code generates a random set of "attractors" which tend to suck near by randomly generated points closer to the attractor. Attractors have a radius which limits their effect. Nearby attractors fight with each other which results in oddly shaped clumping, which is most likely a desirable outcome.

    The script above plots results using Tk and includes a little commented out code that was used while tuning the code. At present both the original points and the biased points are plotted.

    True laziness is hard work
      A little late to the party, but maybe interesting

      Thanks for the reply and code.

      I played with this a little and it definitely produces clumps. The problem with it (for my purposes) is that it always leaves a background of relatively uniformly distributed points, and I couldn't see any way of preventing that.

      For my purpose, I really want clear space around and between the clumps, which the weight map method both achieves and gives fine control over.

      However, your "attractors" idea sparked another idea in me -- completely unrelated to OP purpose -- but intriguing enough to sidetrack me for a day or so.

      What happens if you treat a set of clumpy random points as equally sized particles of some finite mass, and have them all exert "gravity" upon each other according to the inverse square of their distance.

      Tada. The red dots are the starting position, the cyan their position after a single round of attraction, and the blue, after the second.

      Ignore the ones that 'sling-shot' off into outer space; the effect of the inverse square law is that in order to get detectable influence over any distance, you have to set the mass of the points quite high, with the consequence that once they get very close to each other, they exert huge forces that results in sling shots because there are no collisions.

      Over many generations, the gravity will cause all the points to come together (and then be scattered in all directions), but the first few generations have the effect of concentrating whatever clusters already exist. I wonder if this wouldn't be a fruitful technique for tackling the NP-hard clustering problem without resorting making assumptions per the K-means type of algorithm?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Randomly biased, random numbers.
by CountZero (Bishop) on Dec 06, 2013 at 07:26 UTC
    Did you already have a look at Math::Random::OO?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      No, but I'm not looking for 'a better rand', nor a pointlessly OO one.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        It can "plug in" (and actually also provides) different types of distribution function which may better serve your needs.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
Re: Randomly biased, random numbers.
by roboticus (Chancellor) on Dec 07, 2013 at 15:56 UTC

    BrowserUk:

    Here's another idea that occurred to me, so I thought I'd toss it out--build a function that generates a coordinate and then randomly maps it into a "clump" for you. In order to prevent any of your area from being ignored, there's also a chance that it won't be mapped to a clump:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub generate_clumpy_rand { my $num = shift // int(10*rand()+3); my @clumps; print "Generator has $num clumps\n"; for my $i (1 .. $num) { my @center_size = (rand, rand, rand, rand); print "clump 1 centered at ($center_size[0],$center_size[1]) " . "size is $center_size[2]x$center_size[3]\n"; push @clumps, \@center_size; } return sub { my @coord = (rand, rand); # Ensure we have a background of random dots over entire regio +n return @coord if (.1 > rand); # Choose a clump, and map the coordinate into that clump my @clump = @{$clumps[rand @clumps]}; return ( ($coord[0]*$clump[2])+$clump[0], ($coord[1]*$clump[3])+$clump[1] ); } } my $clump_rand = generate_clumpy_rand(1); my @coord = $clumpy_rand->(); print Dumper(\@coord);

    The good point is that it's simple and fast. It's a proof-of-concept, though, and I didn't do any work to ensure that the clumps don't extend beyond the desired range. If you like the idea, then that task is left as an exercise for the reader ;^D.

    Update: I forgot to mention: The clumps in this version are rectangular. Changing their shape is possible, but I didn't bother doing that, either.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      That's essentially a variation on my "I could generate a point; generate a bunch around that point (using a subrange of the coordinate space, and then fully randomly add a few more, ".

      With a few tweaks to constrain the output to the range 0-1, that does produce definite clumps and some odd ones. the main problem with it is the hard edges to the clumps with no graceful transition.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Randomly biased, random numbers.
by Lennotoecom (Pilgrim) on Dec 05, 2013 at 22:09 UTC
    the whitenoise is the only true way mate
Re: Randomly biased, random numbers.
by hdb (Monsignor) on Dec 06, 2013 at 08:46 UTC

    Try to find pictures of copulas, google "copula plots". Those should give you ideas how you want your data to look like. Once you decide for one, there usually is an algorithm to generate the data corresponding to that copula.

      google "copula plots".

      Neither text search nor image search for that term produced anything that (to me) seemed relevant. If something stood out for you, could you please post a direct link?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.