Randomizing Large Integers

hangon has asked for the wisdom of the Perl Monks concerning the following question:

I need to randomize large integers (> 32K). I looked at some modules, but I just need to call $foo = rand $bar repeatedly without setting up objects & parameters. So, I came up with the following scheme to extend the range of rand to around 1G. Basically, it breaks $bar into blocks of nearly equal size (within +1 -0) and calls rand twice, once to select a block, and again to select the number within the block, then it totals everything up. It appears to DWIM and seems a reasonable approximation of rand (the distribution of numbers generated compares closely to rand), but I don't have the expertize to test it rigorously.

It's to be used for simulation, and doesn't need to be perfect. Does this seem like a reasonable approach? Am I overlooking any potential problems?

sub bigrand{

    my $in = shift;
    my $limit = 32768;
    if ($in <= $limit){

        return int rand $in;
    }

    my $blockSize = int($in ** .5);
    my $blockSelect = int rand int ($in/$blockSize);
    my $leftOver = $in % $blockSize;

    if ($blockSelect < $leftOver){

        return  ($blockSelect * ($blockSize + 1)) + 
                int rand ($blockSize + 1);

    }else{

        return  ($leftOver * ($blockSize + 1)) +
                (($blockSelect - $leftOver) * $blockSize) +
                int rand ($blockSize);  
    }
}
[download]

Update: It's been an education. Up to now, rand is one of those functions I've just used and taken for granted. Thanks especially to BrowserUK for the explanation and links to other nodes, and to tachyon-II whose module Math::Random::MT::Perl I ended up using.

Comment on Randomizing Large Integers Download Code

Replies are listed 'Best First'.
Re: Randomizing Large Integers by BrowserUk (Patriarch) on May 19, 2008 at 18:00 UTC
I'd strongly recommend Math::Random::MT. Install it, add `use Math::Random::MT qw[ rand ];` at the top of your program and you've got excellent, tested randomness upto 0 -> 4GB. Simple. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Randomizing Large Integers by hangon (Deacon) on May 19, 2008 at 20:04 UTC
A number of programs have already been written for this simulation, so I actually need to override rand globally using one of its common modules. Math::Random::MT would probably be perfect except for the umm... "political" issues regarding compilers on this system. Thanks for the suggestion though.	[reply]
Re^3: Randomizing Large Integers by BrowserUk (Patriarch) on May 19, 2008 at 20:36 UTC
I'm assuming you're on a Win32 system (you mention >32k integers. Win32 CRT rand is limited to 15-bits of output. Most other systems (nix at least) have 48-bit rands in their CRTs). If that assumption is correct, you might be able to bypass the compiler issue by doing a binary install from PPD. You don't indicate how rigourous your random requirements are, but the problem with deriving* more bits of randomness from less is that the process invariably acts to compound the flaws in the underlying PRNG. If you plot the values from the Win32 Perl rand function on a 2D grid (jpg or png) and increment the color of the pixel each time you generate the same value, you very quickly see distinct banding appear where the distribution of the output is very non-uniform. If you combine random values from a flawed PRNG in an attempt to increase teh number of bits, it almost invariably compounds the banding effect and leads to a highly skewed output distribution. Essentially, you cannot manufacture more bits of randomness, only re-use the existing ones, and that leads to a Doppler effect. The rarified (below random chance) parts of the distribution tend to get further rarified, whilst the above average parts of the distribution tend to combine and reinforce each other leading to very uneven distributions. Maybe this doesn't affect your application, but if your algorithm depends upon an even distribution for its operation (eg. Monte Carlo Simulations), then you'd be well advise to seek a good PRNG up front rather than discovering that you are skewing your results from the outset. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^3: Randomizing Large Integers by tachyon-II (Chaplain) on May 20, 2008 at 11:10 UTC
Here is a pure Perl version of Math::Random::MT::Perl. About 1/3-1/2 as fast as the C/XS but still good for 100,000 well distributed 32 bit random numbers per second. Output identical to the C version. ~~Coming to~~ Latest version at Math::Random::MT::Perl Read more... (6 kB)	[reply] [d/l]
Re: Randomizing Large Integers by mscharrer (Hermit) on May 19, 2008 at 18:25 UTC
Is 32K a limitation of the standard rand function? I figured that in my installation (v5.8.8 built for i486-linux-gnu-thread-multi) rand works for given numbers up to 232. But I couldn't find any documentation for the limit. A 32 bit limitation seems logical, however. In my case I needed random hex numbers with 2N limits, so I just concatenated multiple rand calls using `printf`. On Linux you also can read from `/dev/urandom` to get random bytes.	[reply] [d/l] [select]
Re^2: Randomizing Large Integers by almut (Canon) on May 19, 2008 at 19:48 UTC
Is 32K a limitation of the standard rand function? The macro (`uconfig.h`, line 3368 in 5.8.8 or 4057 in 5.10.0) looks like only 15 bits are being used: `#define Drand01() ((rand() & 0x7FFF) / (double) ((unsigned long)1 << +15)) ^^^^` [download] Update: though, a somewhat closer look suggests this might be configuration dependent (at least, `Configure` checks for `drand48()`), so maybe I spoke too soon... Update2: just ran a test build on a system which has `drand48()`... et voila, the macro ends up being defined as `#define Drand01() drand48()` [download] So, as often, the answer is: it depends :)	[reply] [d/l] [select]
Re^3: Randomizing Large Integers by mscharrer (Hermit) on May 19, 2008 at 20:29 UTC
15 bit of the input or of the output? I get the following behavior that lets me assume that 32 bit random integers work: `# perl -e 'printf "%010x\n", int rand 216' 00000052e2 # perl -e 'printf "%010x\n", int rand 232' 006f7ae0c5 # perl -e 'printf "%010x\n", int rand 264' 00ffffffff` [download] i.e. `rand 2N` seems to return a `N` bit random integer, up to `N = 32` but not above.	[reply] [d/l] [select]
Re^4: Randomizing Large Integers by BrowserUk (Patriarch) on May 19, 2008 at 21:25 UTC
Re^4: Randomizing Large Integers by ikegami (Patriarch) on May 20, 2008 at 16:59 UTC
Re^2: Randomizing Large Integers by hangon (Deacon) on May 19, 2008 at 20:23 UTC
From what I've read, rand is dependent on whats installed on the OS. On this system, when feeding rand with integers > 32768, the distribution of values returned (from multiple calls using the same number) breaks down badly. So this seems to support 15 bits.	[reply]
Re^3: Randomizing Large Integers by syphilis (Archbishop) on May 20, 2008 at 08:01 UTC
Running `perl -V:randbits` will tell you how many random bits your perl's rand() has. Cheers, Rob	[reply] [d/l]
Re^4: Randomizing Large Integers by ysth (Canon) on May 21, 2008 at 04:49 UTC