I humbly submit for my fellow monks' consideration a new CPAN module: Math::Random::MT::Auto. The module implements the Mersenne Twister which is the (current) quintessential pseudo-random number generator (PRNG). It is fast, and has a period of 2^19937 - 1 (> 10^6000).

Key features:

This module culminates my long time interest in PRNGs, and writing it gave me a chance to explore the machinations of putting together a complete CPAN module. I put in the full effort to add a comprehensive POD, thorough testing and even a sample program giving comparative timing results.

This is my first submission to CPAN, and I would appreciate any feedback. (A review posted here on PerlMonks would be fantastic. Any takers?)

UPDATE: The terrific feedback from grinder, tlm, and others inspired me to go the whole nine yards with this module. Version 4.00.00 features:


Remember: There's always one more bug.
  • Comment on New CPAN Module: Math::Random::MT::Auto

Replies are listed 'Best First'.
Re: New CPAN Module: Math::Random::MT::Auto
by grinder (Bishop) on Jun 22, 2005 at 19:19 UTC

    I took a quick look at the code, looks pretty nice. My only real comment concerns the _seed function. You specify the different seeding strategies as code blocks with a huge if/elsif/elsif/else chain.

    I would tend to split these blocks out into seperate subroutines, e.g. _seed_dev_random, _seed_random_org and so forth. And there's probably a bit of redundant code that could be hoisted out of the /dev/random and /dev/urandom seeding routines.

    Also, bear in mind that that on OpenBSD, the two devices are known as /dev/srandom and /dev/random (If I remember correctly, it's been a few years...). FreeBSD also names them slightly differently.

    If you adopt this approach, it simplifies the addition of additional seeding strategies with a minimum of fuss. It also opens up the possibility that client code can add their own seeding routine, maybe reading from /dev/audio, or running a one-way hash digest on the gzipped output of top(1) or something or other.

    This is probably a better tactic for getting patches sent in for additional seeding routines. Telling people to fill out and array and pass it to the seeds() routine seems a little cumbersome to me. And if the seeds() routine is called without any parameters, you should return the current seed array, so that people have the chance of serialising the state of the generator, in case they wish to "pick up where they left off" in a subsequent program invocation.

    Oh, one last thing... you ought to add some statistical tests in your test suite to test that there aren't any glaring non-random errors occurring. Just saying things are between 0 and 1 is a little too simple.

    One thing I really liked about the original MT implementation when I looked at it was that you could get a random number back from 0 .. MAXINT. Which is really nice, because then it becomes trivial to generate numbers on the [0,1] interval, rather than [0,1), which is sometimes nice to have. And if you're scaling up to integers anyway, you can save a few cycles by avoiding the initial reciprocal division that way. Being able to get at the raw U32 value would be nice.

    - another intruder with the mooring in the heart of the Perl

      I would tend to split these blocks out into seperate subroutines...
      I'll do that.
      ... on OpenBSD, the two devices are known as /dev/srandom and /dev/random.... FreeBSD also names them slightly differently.
      I'll investigate this.
      ... opens up the possibility that client code can add their own seeding routine...
      Good idea. I'll add that in.
      And if the seeds() routine is called without any parameters, you should return the current seed array.
      I did make the seed array visible, but this is a better idea.
      ... so that people have the chance of serialising the state of the generator, in case they wish to "pick up where they left off" in a subsequent program invocation.
      The seed and the PRNG's state are different entities. However, I get the idea and I'll add a state() function for this purpose.
      ... you ought to add some statistical tests in your test suite...
      The statistical analysis of the Mersenne Twister is available on the web. The t/mersenne.t test checks the PRGN's workings by checking its output for a known seed. If that test file passes, then the PRGN is working properly. That should cover things.
      ... get a random number back from 0 .. MAXINT.... you can save a few cycles by avoiding the initial reciprocal division.... Being able to get at the raw U32 value would be nice.
      The rand32 function does return 0 .. 2^32-1 (i.e., the raw U32 value). I put it in specifically to avoid the floating point math when it's not needed.

      Thanks for the great feedback.

Re: New CPAN Module: Math::Random::MT::Auto
by tlm (Prior) on Jun 23, 2005 at 17:46 UTC

    Hi. I had a lot of fun reading the source for this module. Here is some constructive criticism, ranging from the major to the pathologically pedantic.

  • PRNG code often uses one or several "magic numbers", like the 624 that appears frequently in the implementation of _seed. As shown in the code given in the previous point, it is both easier to read and to maintain if one puts these into file-scoped variables or constants or accessors. (My preference is to use lexicals with corresponding getters or getter/setters, depending on the desired API.)
  • I am confused by this bit in import
    elsif ($sym =~ 'auto') { # To auto-seed or not $auto_seed = $sym =~ /no|!/; }
    Don't you want $auto_seed to be set to $sym !~ /no|!/ ? If so, I would refactor that to something more like
    elsif ($sym =~ /^(no|!)?auto$) { $auto_seed = not defined $1; }
    ...although I must admit that I don't really understand the difference between importing 'noauto' and simply leaving 'auto' out of the import list altogether. If there is no difference, then I'd get rid of the 'noauto' option and change the fragment above to
    elsif ($sym eq 'auto') { # To auto-seed or not $auto_seed = 1; # default value for $auto_seed should be 0 }
  • It might be nice to have access (e.g. through a getter) to the user agent object used when contacting remote sources over HTTP. This would be similar to what LWP::Simple provides, but using an accessor, instead of exporting the package variable.
  • IMO the documentation could do a better job in explaining when someone may want to use this module over Perl's standard rand, or vice-versa.
  • I think one should relax strict for the smallest scope that requires it. Hence, in import I'd move the no strict 'refs' to
    { no strict 'refs'; *{"${pkg}::$sym"} = \&$sym; }
  • require doesn't import; hence
    eval { require LWP::UserAgent; }; if ($@) { push(@errors, "Failure loading LWP::UserAgent: $@"); next; }
    Actually, I prefer this idiom:
    eval { require LWP::UserAgent } or do { push @errors, "Failure loading LWP::UserAgent: $@"; next; };
  • Update: In point 6, I changed a $sym =~ /^auto$/ to $sym eq 'auto'. What was I thinking?

    the lowliest monk

      1. INIT blocks are broken in Perl 5. Use BEGIN blocks instead.
      I need the INIT block to initialize the PRNG after the user's seed source choices are known from the import subroutine. There's no way around it.

      I added a cautionary note and sample code to work around this issue if the user wants to require my module.

      2. I understand the desire to make a drop-in replacement for Perl's rand (and srand), but I think it is a huge missed opportunity not to make this PRNG available as objects. PRNGs are a special kind of iterator; most, if not all, of the benefits of iterator objects apply to PRNG objects. Specifically, the possibility of having multiple independent PRNGs operating in parallel enables much greater modularization than having a single global PRNG like rand. This can be a real time saver, specially during testing and debugging.
      It's an engineering decision. I opted for speed and simplicity.

      Math::Random::MT has an OO interface for the purposes you describe. The only significant feature it's missing is the ability to save and restore PRNG state.

      UPDATE: I have now added a complete OO interface that is still faster than Math::Random::MT, and at the same time maintained the speed improvement on the standalone PRNG. Support for saving and restoring state vectors was also added.

      3. I agree that statistical tests are essential for the this module's test suite.
      Statistical testing would flush out two possibilities: Either the algorithm is bad, or the code is incorrect.

      As I mentioned before, there seems to be adequate analysis of the algorithm to show that it is fine.

      As for the code, I compare the output of the PRNG for a known seed against known output provided by the algorithm's authors. (The test uses 2000 random number which exercises the shuffle routines through 4 cycles.) If that test passes, then the code is definitely correct. (Such a deterministic test is, in fact, much more stringent than a statistical one.)

      4. I also agree with the opinion that _seed should be broken up. I would use a dispatch table...
      I have broken up this subroutine. I'll use a dispatch table as well.
      5. PRNG code often uses one or several "magic numbers", like the 624 that appears frequently in the implementation of _seed. As shown in the code given in the previous point, it is both easier to read and to maintain if one puts these into file-scoped variables or constants or accessors. (My preference is to use lexicals with corresponding getters or getter/setters, depending on the desired API.)
      I put in a lexical variable.
      6. I am confused by this bit in import
      elsif ($sym =~ 'auto') { # To auto-seed or not $auto_seed = $sym =~ /no|!/; }
      Eeek! A bug! (See my tag line. ;) Thanks. I'll squash it.
      7. It might be nice to have access (e.g. through a getter) to the user agent object used when contacting remote sources over HTTP. This would be similar to what LWP::Simple provides, but using an accessor, instead of exporting the package variable.
      As far as I can see, there does not seem to be any need for the user to have access to the user agent object, and after a seed is acquired, there is no need to keep it around. I guess I don't understand your point.
      8. IMO the documentation could do a better job in explaining when someone may want to use this module over Perl's standard rand, or vice-versa.
      I added a bit of this to the POD.
      9. I think one should relax strict for the smallest scope that requires it. Hence, I'd move the
      no strict 'refs'<code> to <code> { no strict 'refs'; *{"${pkg}::$sym"} = \&$sym; }
      Done.
      10. require doesn't import
      eval { require LWP::UserAgent; }; if ($@) { push(@errors, "Failure loading LWP::UserAgent: $@"); next; }
      I think you meant for me to add an import call. I will.
      Actually, I prefer this idiom:
      eval { require LWP::UserAgent } or do { push @errors, "Failure loading LWP::UserAgent: $@"; next; };
      Not my style, but then again TMTOWTDI. :)

      Thanks for the great feedback. I'll upload my changes to CPAN soon.


      Remember: There's always one more bug.
Re: New CPAN Module: Math::Random::MT::Auto
by jdhedden (Deacon) on Jul 07, 2005 at 16:37 UTC
    Now here's something amazing. I just added 64-bit integer support to this module, and the speed went way up! I got a 66% speed improvement!

    I don't think it has to do with the 64-bit version of the Mersenne Twister algorithm, as it runs through essentially the same code as the 32-bit version (just using different constants). In fact, the 64-bit version has to refresh the random number buffer twice as often, so it's actually doing more work.

    One source of improvement may be better utilization of the CPU (Pentium 4) with everything getting done in 64-bit mode to match the CPU registers.

    Another may be that since my Perl is compiled with 'use64bitint', it may be using the 64-bit ints from the PRNG directly, and not having to convert them from 32-bits (output of the old PRNG) to 64-bits (Perl's internal representation). (Not sure about this.)

    Regardless, it was totally unexpected. Mayhaps this could be a lesson for other Perl XS hackers.


    Remember: There's always one more bug.