locked_user sundialsvc4 has asked for the wisdom of the Perl Monks concerning the following question:

Okay, “it’s a long-g-g-g story that doesn’t bear repeating, but,” I have on my hands some legacy code that uses srand() in an effort to produce a repeatable sequence of numbers.   But of course, when you do that, you have thrown-away your entropy.   Your “random” numbers won’t really be random again.   This wretched program not only does this, but it does this a lot.   (And it generates a lot of other random numbers, too, depending upon those to be “really random” and running into other troubles precisely because they’re “really not.”   Catch-22.)   The program might run for hours or days at a stretch, and it might re-seed the PRNG a hundred times a minute.

So, is there a way to capture the entropy (seed) state of the built-in random number generator, so that I can restore it after doing the sequence that requires re-seeding?   (I observe that srand() merely returns “1.”)

Replies are listed 'Best First'.
Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by anonymized user 468275 (Curate) on May 03, 2011 at 11:52 UTC
    srand() and srand(0) do not cause the same sequence repeating - they cause a random (re)seed to be applied - this was only needed prior to v5.0004 Perl. Only with a parameter >= 1 does it repeat for repeat parameter usage. Under the hood it's a C unsigned integer being taken as a parameter. So there is no way to make a sequence initialised with srand() or srand(0) repeat and for these cases there is nothing to capture.

    One world, one people

      What the code specifically does is to “hash” an identification string into an integer, $n, which is never zero, then it calls srand($n).   This does produce a repeatable sequence.

      My objection, of course, is that once you have re-seeded the built-in PRNG, you can’t return it to its previous seed value ... because, AFAIK, you can’t find out what it is.   Since the PRNG is being constantly re-seeded, the values that it is producing over time are not nearly as random as they need to be throughout the rest of the program.   This is producing a bias problem that is quite serious.

      I guess my real question becomes, “is there a way that I can capture what is the present value of the internal seed, so that, after running the code that requires re-seeding to occur, I can restore the original value and thus preserve randomness?”

      Another “tom twiddy” strategy might be to capture a random number, multiply to make a large integer (because it seems that the seed is an integer?), and then, upon completion of the repeatable-sequence code, srand() again with that value.   The sequence would still show the unmistakable statistical influences caused by the fact that re-seeding is taking place, but at least it would be much more “randomized” than it is right now.   Anyone care to weigh-in on that “Plan B?”

      I am definitely going to “sunset” this way of doing things, but unfortunately that means changes to a rather massive, heavily-mirrored database, with consequent down-time that just can’t be considered right now.

        That value is going to be hidden away in the underlying C or C++ installation by which your Perl was compiled at Perl-installation time. It isn't going to be easy to break into. It would be easier to say globally replace srand with a call to a wrapper function capsrand that captures it on calling before passing it to the real srand. e.g.:
        package CapSrand; sub new { bless { LOG => [], LEN => 10 }; } sub capsrand { my $self = shift; -* $self -> { PARM } = shift; $self -> capture; srand( $self -> { PARM } ); } sub capture { # note: LIFO queue my $self = shift; unshift @{ $self -> { LOG } }, $self -> { PARM }; $#{ $self -> { LOG } } >= $self -> { LEN } and $#{ $self -> { LOG } } = $self -> { LEN } - 1; } sub latest { my $self = shift; $self -> { LOG }[ 0 ] or undef(); } 1;
        Update: to address the question of improving randomness by priming the seed, the usual way is to extract the most random behaving digits out of a time representation (usually near but not at the end of the sequence).

        For example if the time representation (e.g. in fractions of years since epoch) is 31.57065428547826, then the temptation is to take the last few digits for a seed, whereas these are more likely to repeat than say the 8th thru 11th digits of the 14 after the "." owing to the rounding enforced by the O/S to say 1/1000 of a second, which screws the randomness of just the last three digits.

        One world, one people

Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by tchrist (Pilgrim) on May 03, 2011 at 12:42 UTC
    I observe that srand() merely returns “1.”
    According to Perl 5.14’s perlfunc entry for srand:
    The point of the function is to “seed” the rand function so that rand can produce a different sequence each time you run your program. When called with a parameter, srand uses that for the seed; otherwise it (semi-)randomly chooses a seed. In either case, starting with Perl 5.14, it returns the seed.

    Which I believe is what you’re looking for.

      Which I believe is what you’re looking for.
      I don't think so. Suppose he'd captured the seed, then what? This is certainly not what he want:
      sub func() { srand $CONSTANT; ... do stuff using a known sequence ... } my $seed = srand; my $random1 = rand; func(); srand $seed; # Get back to "random" sequence again? my $random2 = rand;
      Now $random1 and $random2 are the same.

      You'd have to keep track of how many random numbers one has generated, and each time you'd call srand with the original seed again, you'd have to call rand that times, discarding the result before continuing.

Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by tye (Sage) on May 03, 2011 at 13:29 UTC
    BEGIN{ *CORE::GLOBAL::srand= sub(;$){}; }

    - tye        

Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by moritz (Cardinal) on May 03, 2011 at 11:39 UTC
Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by JavaFan (Canon) on May 03, 2011 at 13:19 UTC
    So, you have parts of your program that uses rand() to generate a known sequence of numbers, and other parts using rand() to generate "random" numbers?

    A couple of solutions.

    • Pre-generate a billion (or whatever you maximally need) numbers for the known sequences. Store them in a database, file, object, whatever, and retrieve them one-by-one.
    • fork. Generate the one set of sequences in a different process than the other.
    • Use a different source for your random numbers (/dev/urandom for instance).
    • Use a daemon that generates the random numbers for you.
    • First generate all the numbers of one type; then the other.
    I'd probably go for the first option.
Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by chrestomanci (Priest) on May 03, 2011 at 12:45 UTC

    Could you override the built in srand and rand functions with one of your own that checks who the caller is, and chooses which source of randomness to use based on the caller.

    For example, you could arrange things so that the parts of your code that need secure cryptographic randomness get bytes from Crypt::Random, while the rest of your code gets standard rand data, that has been tainted by all the calls to srand.

    Alternatively if the legacy code that is calling srand far to often has little actual need for randomness, then you could consider writing a crude random number generator for its use, as returning standard rand data to all other callers.

    Your crude random number generator could be as simple as a few thousand pre-generated random numbers in a file or database table, where repeated calls just increment through the list, and the value passed in to srand is the fraction through the file to start from.

      Could you override the built in srand and rand functions with one of your own that checks who the caller is, and chooses which source of randomness to use based on the caller.

      This sounds like a guide "how to make my code highly interconnected, and harder to reuse".

      The point of encapsulation is that code should not care where it's called from.

      Changing the code that reads non-deterministic pseudo random numbers to use Math::Random::MT or something similar would be much saner.

        Since I can’t rely upon “that version of Perl,” my present train(-wreck) of thought is to either bring in a separate random-number object for this purpose, or to do the “re-seed with a random number picked just before re-seeding” strategy described earlier.   (It is good to know that Perl is getting this “return the previous seed value” capability, as many other languages do have it, but it’s too late for this (cr)app.)

        Thanks for your inputs, one and all.   Now, back to the coal-mines.

Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by educated_foo (Vicar) on May 03, 2011 at 18:57 UTC
    Why not just re-seed the RNG after whatever requires a known seed?
    srand($known_thing); # code relying on $known_thing srand(0);
    Sure, you get a different stream of random numbers, but it's just as random as the one you had before. Heck, you could even make it a function:
    sub with_srand { srand(shift); $_[0]->(@_[1..$#_]); srand(); } # later with_srand $known_thing, sub { ... }, args...;

      I do not see clear documentation of what srand(0) is supposed to do (in e.g. Perl 5.10.0).   If perldoc srand on my system says it, I missed it twice.

      However, it clearly does have the following blunt caveats:

      Do not call srand() (i.e. without an argument) more than once in a script.   The internal state of the random number generator should contain more entropy than can be provided by any seed, so calling srand() again actually loses randomness.

      and ...

      Most implementations of srand take an integer and will silently truncate decimal numbers.   This means srand(42) will usually produce the same results as srand(42.1).   To be safe, always pass srand an integer.

      (Emphasis theirs.)

      The strategy I am looking at is more or less like this:

      my $random_now = rand();

      srand($repeatable_seed);
      (insert repeatable code here)

      srand(MAXINT * $random_now);

      Although it remains that “the internal state of the random number generator (will) contain (less) entropy,” it seems to be the best available compromise, at least with the versions of Perl that are available to me for this purpose, and it involves a small and focused set of code changes.

      In the long run, of course, I am going to scrap this technique altogether.   I will generate a random sequence of numbers and then simply store that sequence for re-use.   But this legacy-code system is very much in service all across the country right now.   I must dance on eggshells.

Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by JavaFan (Canon) on May 04, 2011 at 00:25 UTC
    You could copy the C code from your systems library that implements rand/srand, and use XS to create functions rand2/srand2, and have either your known sequences or your random sequence generated using rand2/srand2.

    Or perhaps your system has srand48_r and drand48_r. Write some XS to interface with that, and you'll have access to the state that's being kept. So you can have several sequences (known and random).

Re: Legacy code uses "srand()" .. how to avoid losing entropy?
by John M. Dlugosz (Monsignor) on May 10, 2011 at 09:22 UTC
    In the distant past, I've wrapped pseudo-random-number generators in objects (this was in C++) and provided for different instances to exist with their own state data.

    I had stuff that needed to repeat and not be thrown off by other uses of random, so it used its own instance for the purpose.

    You want to simulate that without changing the line-by-line uses, by saving/restoring the global: Well, if the built-in srand is not accessible, use Perl techniques to replace the core::srand or a per-module view of srand to call your code, which does allow such a thing to be done. Better yet, have the srand replacement be a call to the object version so you can replace old-style calls with new ones incrementally without changing the sequence generated.