in reply to Re^15: How likely is rand() to repeat?
in thread How likely is rand() to repeat?

More to your satisfaction?

#! perl -slw use strict; use Math::Random::MT qw[ rand srand ]; $|++; my @c = ( 'A'..'Z', 'a'..'z', 0..9 ); for my $run ( 1 .. 1e6 ) { my %seen; keys %seen = 1e6; printf "\rrun: $run"; for my $id ( 1 .. 1e5 ) { srand( $run * $id ); for( 1 .. 10 ) { my $ID = join '', map $c[ int rand( 62 ) ], 1 .. 25; warn "Dup $ID at $run/$id" if exists $seen{ $ID }; undef $seen{ $ID }; } } } __END__ C:\test>"IDrand(62)x25.pl" run: 54

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^17: How likely is rand() to repeat?
by JavaFan (Canon) on Mar 09, 2012 at 23:47 UTC
    More to your satisfaction?
    There's no need to satisfy me.

    I just don't see what point you're trying to make. You've been arguing against my viewpoint that having a seed of k bits cannot lead to more than 2k sequences. I fail to see how your exercise supports your argument, or contradicts mine.

      {Sigh}. All the way back to the beginning. Your original post said:

      Your random number generator is likely to only have 24, 32, 48 or 64 bits of entropy. Which means that most of the possible strings will never be generated. Even with 64 bits of entropy, there are only 18446744073709551616 strings possible.

      That strongly (and wrongly) implies that if your rand() produces 64-bit values and takes a 64-bit seed, that it only has 64-bits of entropy.

      As I demonstrated with my toy 8-bit, the MSVC 31-bit, and MT19937 PRNGs, neither nor both of the former, imply the latter.

      It also stated:

      Which makes is far more likely for collisions to happen.

      But the difference between 0.00000000000000000000000000000000066% & 0.0000000000000000000000000000000066% hardly constitutes "far more likely".

      Then a couple of posts later you said:

      If srand takes a 64 bit number as argument, then there are at most 2^64 sequences of return values of rand() possible.

      Which strongly (and wrongly) implies that the OP could only get 2^64 unique strings using a rand that uses a 64-bit argument to srand.

      As I've demonstrated, that is only true if he only generates 1 string per starting point. Producing two per, doubles it. Etc.

      Your first post also stated:

      In fact, Wikipedia tells us that with a 64 bit hash, there's a one procent chance you have a collision if you have 6.1E8 strings.

      This time, strongly stating, that the OP can expect a collision after generating 610 million keys. Which is wrong in so many ways.

      • The math of a 64-bit hash is completely unrelated to a rand(). Especially one with 640 dwords of state.

        The odds for even a straight 64-bit rand are far higher.

      • By diminution, implies that 1/610 million is somehow high risk.

        Even were it true, it would be "vanishingly small" for an application that only needs < 1 million strings.

      The purpose of this exercise (the code above) was to demonstrate, in a way that anyone could reproduce, that the scare-mongering you've done in this thread has no basis in reality. I hoped that by inviting your opinion/modifications to the demonstration code, to finally end this thread with a tangible conclusion on which we might both agree, rather than the most unsatisfactory one of disparate opinions and bunch of very big/very small numbers.

      Beyond reporting back on the results of running the last version of the demo code, I think that this thread has nowhere left to go.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?