in reply to Re: How likely is rand() to repeat?
in thread How likely is rand() to repeat?

But that's all irrelevant. Your random number generator is likely to only have 24, 32, 48 or 64 bits of entropy. Which means that most of the possible strings will never be generated. Even with 64 bits of entropy, there are only 18446744073709551616 strings possible. Which makes is far more likely for collisions to happen.

That would only be the case if he was only calling rand() once to pick from the 6.45e44 strings. But he is calling rand 25 times. Which as each pick is 'independant' of the previous and next picks, it increases the effective entropy.

By way of proof. If you throw a die once, you can only get 1 .. 6; but throw it twice and you can get 1 .. 36.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^3: How likely is rand() to repeat? (srand)
by tye (Sage) on Mar 09, 2012 at 00:08 UTC

    No. Perl's rand() is documented to be completely repeatable when given the same starting point (see srand). So the number of strings possible using just rand() is bound by roughly 2 ** (the number random bits) or, more correctly, the number of effectively different srand() values.

    Update: And every implementation that I've seen uses a simple LCPRNG (Linear Congruential Psuedo-Random Number Generator). So each has a fixed number of possible values and it will simply cycle through those in the exact same order each time if you actually call it 2**$bits times. The number of effectively different seed values is called the "period".

    Update: Oh, forgot to mention that some implementations intentionally use only a subset of the (upper) bits so the period can often be much higher than 2**$bits. For example, a period of 2**32 is quite common while many such implementations only use 16 bits of each result. And I'm not completely sure, in such a case, whether Perl Config reports that as 16 or 32 "random bits".

    Note: Just FYI, the above two updates were posted within a few minutes of the original submission (and thus nearly 2 hours before the first reply).

    - tye        

      So each has a fixed number of possible values and it will simply cycle through those in the exact same order each time if you actually call it 2**$bits times. The number of effectively different seed values is called the "period".

      Sorry tye, but you are very wrong on this, and I can prove it.

      The "period" is the number of values you have to draw before the entire sequence of 2**n-bits values, repeat in the same sequence, in their entirety!

      The Mersenne Twister MT19937, with 32-bit word length, has a period of 219937 - 1. How could that be if your statement above was true?

      Of course, that is not a LCPRNG.

      But, the rand() provided by MSVC is a linear congruential PRNG. And it only produce 15-bits on entropy. Proof:

      C:\test>perl -E"++$h{ int( rand 65536 ) } for 1 .. 1e6; say scalar key +s %h; grep{ $_ & 1 } keys %h or say 'No odd numbers found'" 32768 No odd numbers found

      And for the whole sequence to repeat, the first two values would have to repeat one after the other first. And according to you, that should happen within the first 32768 values produced.

      It doesn't. (With srand(1)):

      C:\test>randperiod -M=2 41 18467 i:1 n:412286284 First sequence of 2 values repeated itself after 412286284 calls to ra +nd

      That's 412 million values generated before the first pair are repeated in sequence.

      Now let's try to match the first 3 values produced:

      C:\test>randperiod -M=3 41 18467 6334 i:2 n:2147418117 First sequence of 3 values repeated itself after 2147418117 calls to r +and

      That's 2 billion values drawn before the first 3 values repeat in sequence.

      The test code:

      #! perl -slw use strict; sub rand32768{ int( rand 32768 ) } $|++; our $M //= 10; srand( 1 ); my @first = map rand32768(), 1 .. $M; print "@first"; my $n = $M; OUTER: while( 1 ) { ++$n until rand32768 == $first[ 0 ]; for my $i ( 1 .. $M - 1 ) { ++$n; printf "\ri:$i n:$n"; redo OUTER unless rand32768() == $first[ $i ]; } last; } print "\nFirst sequence of $M values repeated itself after $n calls to + rand";

      I've tweaked the code to log how many value were drawn for each increasing length sequence. I'll leave it running over night.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re^3: How likely is rand() to repeat?
by JavaFan (Canon) on Mar 09, 2012 at 01:05 UTC
    Which as each pick is 'independant' of the previous and next picks, it increases the effective entropy.
    But that's exactly the problem. As I indicated, if rand() would be perfect, it will pick each of the 645345427773512447880377451634304602899218432 different possible string with an equal chance.

    But with most implementations of rand the picks are not independant; the sequence of return values of rand is completely determined by the value of the seed. If srand takes a 64 bit number as argument, then there are at most 264 sequences of return values of rand() possible. But that requires an excellent implementation of rand(), and you got to be lucky enough the mapping to 0..61 doesn't provide duplicates.

      Let's say we were on a 2 bit processor and we had a 2-bit PRNG. There could only be 22 starting points (seeds).

      But the (non-repeating) sequences it could produce are any permutation of the following 24 permutations of the 4 basic values it can produce:

      {0, 1, 2, 3} | {0, 1, 3, 2} | {0, 2, 1, 3} | {0, 2, 3, 1} | {0, 3, 1, 2} | {0, 3, 2, 1} | {1, 0, 2, 3} | {1, 0, 3, 2} | {1, 2, 0, 3} | {1, 2, 3, 0} | {1, 3, 0, 2} | {1, 3, 2, 0} | {2, 0, 1, 3} | {2, 0, 3, 1} | {2, 1, 0, 3} | {2, 1, 3, 0} | {2, 3, 0, 1} | {2, 3, 1, 0} | {3, 0, 1, 2} | {3, 0, 2, 1} | {3, 1, 0, 2} | {3, 1, 2, 0} | {3, 2, 0, 1} | {3, 2, 1, 0}

      Hence, the 32-bit, Mersenne Twister MT19937 can produce 219937 - 1 values (from any given starting point) before it repeats itself exactly.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        But the (non-repeating) sequences it could produce are any permutation of the following 24 permutations of the 4 basic values it can produce:
        Can you explain how it does that? Given just four different values for the seed, how can you pick from 24, with each element having a chance to be selected?
        Hence, the 32-bit, Mersenne Twister MT19937 can produce 219937 - 1 values (from any given starting point) before it repeats itself exactly.
        Sure. But how many different such sequences can it make? Looking at the pseudo code implementation on Wikipedia, it's all derived from a single, 32-bit seed. Which would limit the number of possible sequences to 232.