in reply to Check randomly generated numbers have not been used before

Two years late .. but possibly relevant to the next guy with this question.

I was given the job of generating random part numbers for the current products in inventory. Any new products that would be introduced would also need a part number, and of course these needed to be values not yet used. I chose a pseudo-random number generator algorithm P[n+1] = (P[n] + offset) % S where P is the list of part numbers, S is size of part number space, and offset is the first prime number > S/2. Run S times, P[1..S] will contain every number between 0 and S-1 exactly once, in what appears to be a random order. If you stop the process before reaching S, just remember where you stop and start there when you need the next part number. In this application, it was only necessary that the numbers appear random.

But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Replies are listed 'Best First'.
Re^2: Check randomly generated numbers have not been used before
by AnomalousMonk (Archbishop) on Sep 16, 2016 at 17:49 UTC

    This appears to be a Linear congruential generator for which the multiplier parameter a = 1. I'm far and far from being a PRNG expert (I'd bet BrowserUk could write a lengthy article on this topic before his first mug of caffeine-delivery-beverage-of-choice in the morning), but I think you need to exercise some caution in chosing the m a c parameters so that the period of the generator is maximized (i.e., equal to m) and the generator never gets "stuck" at a particular value of Xn. See the linked article. Just glancing at this article, I don't see any case in the "Parameters in common use" table of a = 1, so this may be a red flag.

    Update: After more thought (and looking at pryrt's post), a multiplier a = 1 does seem to meet all the requirements for maximum period length for this type of PRNG, but gives a boring "clock" progression to the output, i.e., the output always advances by the offset mod S. Don't you want the output to bounce around a bit more and at least seem more random? ;)


    Give a man a fish:  <%-{-{-{-<

      In Xn+1 = (a*Xn + c) (mod m): If the offset c is relatively prime with m, you will generate a maximal-length sequence. Since GotToBTru picked the first prime ≥ S/2, it will be maximal with length S, and will guarantee that's there are at most 2 numbers from each period (ie, Xn+1 may not have wrapped around S, but Xn+2 definitely will have).

      You're exactly right about this. I knew this sort of thing had an official title, just didn't know what it was. I've seen this sort of algorithm called a "random number generator", but it is nearly the opposite. Having once issued a value in the space, there is 0% chance it will re-issue it. Well, until you start through the list a second time, in which case it will issue the very same list of numbers in exactly the same order. The "random" function in the BASIC interpreter I used 40+ years ago used something very much like this, and my first attempts at computer games were very disappointing until I figured that out ;)

      But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

        I've seen this sort of algorithm called a "random number generator", but it is nearly the opposite.

        That's the "pseudo" in "Pseudo-Random Number Generator." :)


        Give a man a fish:  <%-{-{-{-<

Re^2: Check randomly generated numbers have not been used before
by BrowserUk (Patriarch) on Sep 16, 2016 at 19:31 UTC

    Whilst the coprimality of parameters ensure uniqueness whilst saving you from have to track previous picks; the method has a weakness: if an attacker can cause you to give him two consecutive picks -- say, by inspecting the latest additions to the parts inventory -- he can determine S, and from that start guessing your new part numbers.

    He doesn't even have to find to consecutive picks; only two that where allocated close to each other. If the difference between them is prime, he's probably found S; if not, he only need find the prime factors of that difference to find it.

    With a little more inspection and analysis, he will be able to determine your offset; and at that point, the benefits of allocating "seemingly random" part numbers is defeated.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Depends on what benefit of "seemingly random" you're looking for. Remember, these are part numbers, used in an inventory system, not part of any cryptographic system. They would function as database keys, so they had to be unique. It was desired that the numbers not be sequential, and that they would span the space. Probably to prevent anyone programming forms or reports from making assumptions about how many digits to allow for part numbers that would cause trouble at a later time. That was my guess at the time.

      I put this together in 1993. My algorithm could still be in use to this day.

      But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

        Depends on what benefit of "seemingly random" you're looking for.

        True. Schemas similar to this are often used to prevent people from guessing IDs for all sorts of purposes.

        Eg. Genomic BLAST servers and similar that perform long running processing as a service, will often generate a random url at which the results will be made available once they are ready. It is obviously important that no two current results sets are allocated the same number.

        To prevent the possibility of industrial or academic espionage; by (for example) someone running a small job at a server known to be used by their target, and then systematically probing the next few hundred or thousand ids looking to find what their targets are working on; the site will generate a url containing a number picked at random from a very large set. This makes the practice of probing urls very inefficient. If the number set is large enough, and the generated urls go away once the results have been downloaded or after some preset time period, the possibility of anyone guessing a url can be made vanishingly small.

        However, many of the schemas based around LCGs have been shown to be easily and quickly cracked.

        A better scheme is to allocate 512MB of space -- a disk file or binary blob -- and use the 4 billion bits to record IDs used. To pick a number, generate a random offset into the blob and then scan forward until you find an unset bit, set it and use the number it represents. To further disguise the value, embed the 10 decimal or 8 hex digits at some known offset within a larger field of randomly chosen digits.

        This is a low-cost, highly reliable method that due to the total absence of any pattern, is almost as good as a one-time pad as far as security is concerned.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.