I definitely do not want any formally defined distribution.
Do you have sample data you can perturb, mix, or otherwise use to generate test data?
I want lopsided distributions that are completely randomly generated.
Hm... Maybe transform your PRNG through a random, monotonic, nonlinear mapping? e.g. generate a piecewise-linear function (or spline) in each dimension with steps taken from the PRNG, then generate uniform random points and apply the function to them. I suspect a Real Statistician would scoff, but I am not such a person.
| [reply] |
generate a piecewise-linear function (or spline) in each dimension with steps taken from the PRNG, then generate uniform random points and apply the function to them.
That's sounds like a real possibility -- or rather looks like it having done a search for "spline" and seen a few images. I'm imagining sticking a few (2 or 3 or 4 decided at random) sticks, of random length, into a square of ground at randomly chosen points; and then draping a tarpaulin over them. The height of the tarpaulin at any given point then "influences" the randomly generated xy pairs such that they tend to concentrate around the sticks.
I haven't a clue how I'd go about it though :( (Offers?:)
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
You more or less have it.
I haven't a clue how I'd go about it though :( (Offers?:)
For my standard consulting rate? ;-)
Basically, you want to take a uniform distribution between 0 and M, and map it to a non-uniform one between 0 and N, so that X+I < X+J when I < J. One way to do this is to generate a step-functionpiecewise linear function, then translate values through that step function. In Matlab, it looks something like this for a 10-step function:
fx = [0,cumsum(unifrnd(0,1,1,10))];
tmp=unifrnd(1,10,1,1e5);
ix=floor(tmp);
dx=rem(tmp,1);
values = (fx(ix) + (fx(ix+1)-fx(ix)).*dx)./fx(end-1);
Using a spline would be more complicated, but would be smoother. | [reply] [d/l] |
That's hard. Mostly because I definitely do not want any formally defined distribution
Download random pictures from the Internet and use them as the base to generate density functions.
You may apply some simple transformations (for instance, dynamic range decompression) to obtain more disparate distributions.
| [reply] |
Indeed. That's pretty similar to the ideas I had -- "Eg. grab a random image, process the image with a filter to reduce it to a just points of a particular color or hue; or maybe use a Conway's Life type process to manipulate the pixels until groups of similar hues the reduce to single points; or a dozen other ideas; and then use those points as my dataset." -- triggered by roboticus' post.
However, it turns out to be rather more difficult than I imagined.
I thought of two ways to tackle this approach:
- Try to derive the points for my test data directly from the randomly chosen images.
It fairly easy to manually pick and apply a few filters to any given image to reduce it to a bunch of discrete pixels -- converting to to grey scale, then explosion followed by embossing works well for many images; as does repeatedly applying a high filter until the number of non-black pixels is reduced to a usable number -- but finding a single sequence of filters that produce good datasets from a wide range of images is very hard.
And even when doing this manually, it is surprising how often that once you succeeded in reducing the image to discrete pixels, they end up being pretty uniformly distributed.
- Use the color or luminance or hue of the images to weight the picking of 'random' pixels.
This is also quite hard to do other than via the rejection method -- pick a random pixel and reject if the chosen attribute is above or below some cut-off value -- which can be very time consuming.
The only other method I came up with was to construct a 'weight stick'. Eg.
Say this represents the 2D weights map: +--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 5| 5| 4| 3| 3| 2| 2| 1| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 5|10| 8| 6| 5| 4| 3| 1| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 3| 6| 5| 5| 5| 5| 4| 2| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 2| 3| 4| 5| 6| 6| 3| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 1| 2| 3| 5| 5| 4| 3| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 1| 2| 4| 3| 3| 2| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 1| 2| 1| 2| 1| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
Then I build a 1D vector containing the (pixel coordinate pair) x its weight: ([0,0])x 0, ([0,1])x 0, ([0,2])x 0, ...
([0,1])x 0, ([1,1])x 5, ([2,1])x 5, ([3,1])x 4, ([4,1])x 3, ...
([0,2])x 0, ([1,2])x 5, ([2,2])x 10,([3,2])x 8, ...
...
(I packed these into a scalar to save space.)
Now, to pick pixels, I randomly index into the vector and get one value for every pick. The picking is fast, but the construction is relatively slow. And the higher the range of weight factors, the more memory it takes and the longer it takes to construct, but it works very well.
Once I had this working, I was still not finding a good way to produce good weight maps from randomly chosen images. So then I decided to try and construct good weight maps randomly, but directly.
This took a little trial and error, but I've come up with a method that seems to work quite well. It's still somewhat crude and I need to iron out some edge cases, but I've posted the code below.
To generate the weight maps, I pick a few random points and pick a random weight for those points. Then I grade those high points out to the edges of the area in the x-axis. Then I grade those values to the strips of values created by the other points, or the edges in the y-axis.
Drawn in grey scale, this produces weight maps like these: img img img, which I'm rather pleased with.
Once weight-maps like these have been vectorised and then used to pick a 1000 weight-random pixels, the results look like these:img img img.
The results are everything I could have hoped for; though the currently implementation leaves a lot to be desired - especially the slowness of the vectorisation when higher weight range is used. I'll probably have to move that process and the grading process into C to make this usable.
If you can see improvements to either the grading process -- which currently occasionally produces really bizarre effects for reasons I haven't tracked down -- or ways of speeding up the vectorisation without dropping into C, I'd be very interested to hear them.
The current code: #! perl -slw
use strict;
use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000;
use GD;
use constant { X => 0, Y=> 1, R => 2 };
sub rgb2n{ unpack 'N', pack 'CCCC', 0, @_ }
my $RED = rgb2n( 255, 0, 0 );
my $GREEN = rgb2n( 0, 255, 0 );
my $BLUE = rgb2n( 0, 0, 255 );
my $YELLOW = rgb2n( 255, 255, 0 );
my $MAGENTA = rgb2n( 255, 0, 255 );
my $CYAN = rgb2n( 0, 255, 255 );
my $WHITE = rgb2n( 255,255,255 );
sub gen_rand {
my $rand_high = shift;
my $high_part = $rand_high;
my $rand_sum = 0;
while ($high_part) {
my $rand_arg = 1 + int rand $high_part;
$high_part -= $rand_arg;
$rand_sum += int rand $rand_arg;
}
return $rand_sum;
}
our $N //= 1000;
our $X //= our $Y //= 500;
our $W //= 10;
## Initialise the weight map to zeros.
my @map = map[ ( 0 ) x $X ], 1 .. $Y;
## Pick a random number of random 'peak' points.
my @peaks = map[ int rand $X, int rand $Y ], 1 .. ( 1 + rand( 8 ) );
for my $peak ( @peaks ) {
## pick a random value for this peak
my $val = 2 + int rand $W;
## and grade it out to the left edge (if the peak isn't at the lef
+t edge)
if( $peak->[X] > 0 ) {
my $delta = $val / ( $peak->[X] );
$map[ $peak->[Y] ][ $_ ] = $map[ $peak->[Y] ][ $_-1 ] + $delta
+ for 1 .. $peak->[ X ];
}
## and grade it out to the right edge (if the peak isn't at the ri
+ght edge)
if( $peak->[X] < $X ) {
my $delta = $val / ( $X - $peak->[X] );
$map[ $peak->[Y] ][ $_ ] = $map[ $peak->[Y] ][ $_-1 ] - $delta
+ for $peak->[ X ]+1 .. $X;
}
}
## Now grade out between the x-lines and the top and bottom edges.
for my $x ( 0 .. $X-1 ) {
my $first = 0;
for my $second ( map( { $map[$_][$x] != 0 ? $_ : () } 0 .. $Y-1 )
+, $Y-1 ) {
my $delta = ( $map[ $second ][$x] - $map[ $first ][$x] ) / ( $
+second - $first );
$map[$_][$x] = $map[$_-1][$x] + $delta for $first+1 .. $second
+;
$first = $second;
}
}
## Draw and display the weight map as grey scale image for visualisati
+on and checking
my $im = GD::Image->new( $X+2*100, $Y+2*100, 1 );
$im->fill( 0, 0, $WHITE );
$im->rectangle( 100, 100, $X+100, $Y+100, 0 );
for my $y ( 0 .. $#map ) {
for my $x ( 0 .. $#{ $map[0] } ) {
my $rgb = int( $map[$y][$x] ) * 255 / $W;
$im->setPixel( 100+$x, 100+$y, rgb2n( ( $rgb ) x 3 ) )
}
}
open PNG, '>:raw', "$0.png" or die $!;
print PNG $im->png;
close PNG;
system "$0.png";
## Vectorise the weight map to a weight stick
my $stick = '';
for my $y ( 0 .. $#map -1 ) {
for my $x ( 0 .. $#{ $map[ 0 ] } -1 ) {
## The -3 ensures isolated islands of points; but creates an e
+dge case.
my $packed = pack( 'vv', $x, $y ) x ( $map[ $y ][ $x ] -3 );
$stick .= $packed;
}
}
## a sub that uses the weight stick to generate weighted random values
+.
sub rndPoint {
my $rnd = int rand( length( $stick ) / 4 );
my @point = unpack 'vv', substr( $stick, $rnd*4, 4 );
return \@point;
}
## generate some points
my @points = map rndPoint(), 1 ..$N;
## Draw the points over the weight map for checking
$im->filledArc( 100+$_->[X], 100+$_->[Y], 5, 5, 0, 360, $RED ) for @po
+ints;
## and display it.
open PNG, '>:raw', "$0.png" or die $!;
print PNG $im->png;
close PNG;
system "$0.png";
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
Use the color or luminance or hue of the images to weight the picking of 'random' pixels.This is also quite hard to do other than via the rejection method
There is a much more efficient method. See here, and here.
The trick is to build an 1D array with the accumulated weights @acu. Then, pick random numbers ($r) in the range [0, $acu[-1]) and use binary search to look for the index $ix such that $acu[$ix] <= $r <= $acu[$ix + 1].
| [reply] [d/l] [select] |