in reply to Re^7: How likely is rand() to repeat?
in thread How likely is rand() to repeat?

If you think I'm wrong, show an algorithm that proves otherwise. Given a 2-bit state, that shouldn't be overly complicated.

2-bits is clumsy. I hope you'll accept an 8-bit rand algorithm that demonstrates a greater than 256 period?

#! perl -slw use strict; use Data::Dump qw[ pp ]; { my @x = (0x00011011) x 24; my $x = 0; sub srand8 { $x = $_[0] % 24; } sub rand8{ $x = ++$x % 24; $x[ $x ] = ( $x[ $x ] * 33 + 251 ) & 255; return $x[ $x ]; } } our $L //= 1e4; our $S //= 1; srand8( $S ); my $s = ''; $s .= pack 'C*', map rand8(), 1 .. 256 for 1 .. ($L/256+1); print length $s; $s =~ m[(.{256}).*?(\1)]sm and print "Sequence at [ $-[1], $-[1] ] repeats at [ $-[2], $+[2] +]"; __END__ C:\test>rand8 -S=1 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=2 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=3 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=4 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=5 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ] C:\test>rand8 -S=255 10240 Sequence at [ 0, 0 ] repeats at [ 6144, 6400 ]

That 6144 period could probably be improved upon with some time spent tweaking the constants, but it is hardly over-complicated.

Now, you may have a point if the OP was generating all the passwords he may ever require in his life, in a single run of the program.

Okay. Half way there. :)

That is what I assumed he was doing. I felt (still feel) that was his intent from reading the OP. But, you might be right that he intends generating them piecemeal. Or on-demand.

Using the 32-bit MT, as you've said, there are 2**32 starting points. That's 4e9 starting points into a non-repeating sequence of 4e6001.

Assuming he allows it to self-seed -- no srand() -- even if perchance two of his runs picked adjacent seed-points in the sequence, on average, he'd have to generate 4e6001 / 4e9 = 1e5992 rands before the two sub-sequences would overlap.

So, (ignoring the birthday paradox, imperfect PRNG etc. for a moment), for him to get a dup, he would have run his program 2**32 times and pick exactly 1 sequence each time. But if he generates 10 each time, that's 10 * 2**32 sequences before he gets a dup.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^9: How likely is rand() to repeat?
by JavaFan (Canon) on Mar 09, 2012 at 14:02 UTC
    I hope you'll accept an 8-bit rand algorithm that demonstrates a greater than 256 period?
    Sure.
    my @x = (0x00011011) x 24;
    But that's not 8-bits. You keep a state using 768 bits. You've no dispute from me that you can create long periods from that. Busy Beavers can go through an amazing number of steps with just very limited memory to keep state on. A trivial counter using a rollover can go through 2768 values before repeating itself.

    However, considering that you are using 8-bits seeding, all you have are 256 different sequences. Regardless how long they are.

    Assuming he allows it to self-seed -- no srand() -- even if perchance two of his runs picked adjacent seed-points in the sequence, on average, he'd have to generate 4e6001 / 4e9 = 1e5992 rands before the two sub-sequences would overlap.
    That I do not understand. There are 232 seeds. Each of them starts a different sequence. You don't get to start at a random point in the sequence. You could of course keep track of where you are in the sequence, but that requires adding ⌈log2P⌉ bits to the seed, where P is the length of the period.
    So, (ignoring the birthday paradox, imperfect PRNG etc. for a moment), for him to get a dup, he would have run his program 2**32 times and pick exactly 1 sequence each time. But if he generates 10 each time, that's 10 * 2**32 sequences before he gets a dup.
    I read this as "the more he generates, the more it takes for a duplication to happen". That seems quite counter intuitive to me, and I'm not sure if that's what you mean.
      But that's not 8-bits.

      By that assessment, then neither is MT19973 a "32-bit PRNG", so basing probabilities relating to its use upon 2^32-bits are wrong also.

      As I've shown with my 8 bit toy rand, you can get more than 8-bits of entropy out of a "headline" 8-bit RCPRNG.

      Equally, the win32 built-in which is described as a 2^15 bit generator, cannot be assessed entirely by formulae using 2^15 either, because it has a period of close to 2^31.

      There are 2^32 seeds. Each of them starts a different sequence.

      Are you sure about that?

      Sure it isn't a single, 4e6001 value non-repeating sequence, and all the seeding does it start you at a different place within it.

      Ie. think of the sequence folding back on itself in a circle. The seeding picks a starting point on that circle and then the generator runs around the circle until it finally reaches back to it starting point when it then repeats.

      Of course, there is no way to prove that for the MT.

      I read this as "the more he generates, the more it takes for a duplication to happen". That seems quite counter intuitive to me, and I'm not sure if that's what you mean.

      Come on. The OP needs 25 rands for each string. If he only picks one set of 25 from each seeded starting position, and there are 2^32 such positions, then he can pick at most 2^32.

      But, we know there are 6.45e44 possible strings. So he'd only have obtained 0.00000000000000000000000000000000066% of the possibilities. However, if he grabs 50 values from each start position and builds 2 strings with them, he now has twice as many strings.

      And if he builds 10 strings from each starting point, he has ten times as many strings, but that's still a vanishingly small proportion of the total possibilities: 0.0000000000000000000000000000000066%.

      So no, I didn't mean what you said. I am saying that you are only limited to 2**32 strings if you only generate 1 string from each seed position.

      But that you can generate 10 (or 100 or 1000) strings from each starting position, thereby producing 10 (or 100 or 1000) * 2^32 strings, and the odds of having produced a duplicate are still "vanishingly small". Slightly higher than if you only pick 1 at each position, but even 10 (or 100 or 1000) times the infinitesimal, is still infinitesimal.

      Minute; way less than micro nano pico femto atto zepto yoctoscopic.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        By that assessment, then neither is MT19973 a "32-bit PRNG", so basing probabilities relating to its use upon 32-bits are wrong also
        MT19973 generates 32 bit numbers. It will not generate more than 232 different numbers. It takes 32 bits as a seed. It uses just short of 20k bits to keep state. I don't know what the term "k-bit PRNG" exactly means, which why I tried avoiding that term and keep using seed and state sizes.
        There are 232 seeds. Each of them starts a different sequence.
        Are you sure about that?

        Sure it isn't a single, 4e6001 value non-repeating sequence, and all the seeding does it start you at a different place within it.

        Fine, whatever. Doesn't make a iota difference to the argument. But if you want to split hairs, be my guest. So you have 232 different starting points in the sequence.