BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

  1. If you read the blurb on Mersenne Twister it says: It is proved that the period is 2^19937-1..

    How do they know that?

    Explanations that go beyond "They used math"; and that are written in recognisable English are favoured here :)

  2. More generally, given a Perl function that produces a random number each time it is called, how could you detect when the sequence repeated itself?

    Bear in mind that the sequence might start at any point and could be very long.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Challenge: Detecting sequences.
by InfiniteSilence (Curate) on May 10, 2013 at 17:58 UTC

    Computer science is little more than a bunch of applied mathematics, so they know that by using mathematical proving methods. There is a paper on this if you are so inclined that explains not only the algorithm but some recognized flaws that the author has since fixed.

    Celebrate Intellectual Diversity

Re: Challenge: Detecting sequences.
by educated_foo (Vicar) on May 10, 2013 at 17:59 UTC
    The NIST page on statistical tests for randomness might be a good place to start.
    Just another Perler interested in Algol Programming.
Re: Challenge: Detecting sequences.
by hdb (Monsignor) on May 10, 2013 at 17:02 UTC
      However, simple generators (unlike the Mersenne Twister) are really deterministic, so if you observe the same number a second time,

      A generator of the integers 0 .. 2, can generate those 3 values in 6 different sequences:

      a: 0 1 2 b: 0 2 1 c: 1 0 2 d: 1 2 0 e: 2 0 1 f: 2 1 0

      And those 6 (sub)sequences can be output in 720 permutations before repetition is required:

      a b c d e f a b c d f e a b c e d f a b c e f d a b c f d e a b c f e d a b d c e f a b d c f e ... f e c d b a f e d a b c f e d a c b f e d b a c f e d b c a f e d c a b f e d c b a

      That's not strictly true, because it is likely that some 18-digit overlaps between two subsequences will replicate an earlier subsequence.

      But then, the last 2 digits of subsequence a, and the first digit of subsequence b; form an out-of-sync repetition of subsequence d. and given that MT can only produce 2**32 values; similar, out-of-sync subsequence repetitions have to have occurred many times to arrive at the 2**19937-1 periodicity.

      So, repetition of a single value, or even a single subsequence is not an indicator that periodicity has been reached.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        SIMPLE RNGs, that I was referring to, are usually working like x[i+1] = a*x[i]+b mod m, with suitable a, b, and m, which cause the sequence to repeat, once a number repeats.

Re: Challenge: Detecting sequences.
by davies (Monsignor) on May 10, 2013 at 21:02 UTC

    Is the given period a Mersenne prime (if it's prime, it is a Mersenne prime) or a multiple of several small other primes?

    If it is prime and any two numbers in the entire sequence are different, the prime is the period. Take two non-random sets of numbers. The first is 1, 2, 1 and the second 1, 2, 1, 2, 1. Even though these are apparently the same, they will give different sequences. The fifth number from the first set will be 2 and from the second 1. Compare this with non-prime sequences (1, 2 and 1, 2, 1, 2). The period of both of these is 2.

    Next, let's imagine that the number generator uses two sets of numbers, drawing one alternately from each set. If the first two sets are used, the period will be 15 because both sets will have to reach the end at the same time before the sequence repeats.

    Now, what is interesting is that using the second pair of sets in the way I have described gives a period of 4, not 2. But that isn't guaranteed. So if the period given is simply the lowest common multiple of the periods of the various PRNGs of shorter period, it might be more correct to say that the period is AT LEAST the number given. Alternatively, they might have been able to look at it in the same way as my second two sets of numbers, but if that is the case I'm afraid that the best explanation I can give is "they used maths".

    Note that throughout this, I have not been testing for randomness but for period length. If you're testing a single prime sequence of this length for randomness you have to use sampling. But if you're testing a number of smaller sequences, each can be tested in its entirety using things like runs tests and chi squared. Note also that I am not a mathematician, so if anyone wants to tear my argument to pieces, I'd be very interested. I did check this logic with a mathematician once and he said that it was correct (a) if I remember what he said correctly and (b) if I explained it correctly. We'd both had a few beers.

    Regards,

    John Davies

    Update: I think the period of my sequence based on the 3 and 5 period sequences is actually 30, not 15. This is because it needs to be multiplied by the number of sequences. But I'll check when I don't need my bed so badly. This might explain why the 2 and 4 number sequences give a combined period of 4.

      Is the given period a Mersenne prime

      Yes (and thank you for your response). But ...

      The value appears to be derived from the fact that the MT retains (and twists) 19937 bits of state: 624 x 32-bit words; of which 31-bits (which move with each iteration) are ignored.

      I've read and re-read the descriptions of the MT (and its "proven" capabilities) many times over the years; but the process by which a periodicity of ~4e6001 can be proven escapes me. The various descriptions of the underlying basis of the proof -- a mechanism termed: inversive decimation -- go way over my head.

      I was hoping, that the first part of my challenge would elicit a layman's description of it, but it seems that large numbers of the (self-proclaimed) mathematicians around here are only such in the same sense as my 9 y/o niece is a violin virtuoso. She can read the notion and play all (within the limitations of her small hands) the notes in time with the metronome, but she is neither Stephane Grappelli nor Yehudi Menuhin.

      The second part of my challenge was actually of most interest to me. I suspect that the use of a Rabin–Karp style rolling checksum might be a practical approach for sequences up to say 1 or 2 billion in length; but I can't see an algorithmic approach that will get much beyond that in a realistic amount of time.

      Hence the first part of my question. I was hoping that if I understood inversive decimation, it might be applicable to the non-deterministic generator I am playing with; but it seems (from the further reading I've done thanks to the links posted in this thread) that ID is only possible due to the special nature of the Mersenne prime bits of state.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Challenge: Detecting sequences.
by LanX (Saint) on May 10, 2013 at 15:59 UTC
    They used math!

    Cheers Rolf

    ( addicted to the Perl Programming Language)