Re: New CPAN Module: Math::Random::MT::Auto
by grinder (Bishop) on Jun 22, 2005 at 19:19 UTC
|
I took a quick look at the code, looks pretty nice. My only real comment concerns the _seed function. You specify the different seeding strategies as code blocks with a huge if/elsif/elsif/else chain.
I would tend to split these blocks out into seperate subroutines, e.g. _seed_dev_random, _seed_random_org and so forth. And there's probably a bit of redundant code that could be hoisted out of the /dev/random and /dev/urandom seeding routines.
Also, bear in mind that that on OpenBSD, the two devices are known as /dev/srandom and /dev/random (If I remember correctly, it's been a few years...). FreeBSD also names them slightly differently.
If you adopt this approach, it simplifies the addition of additional seeding strategies with a minimum of fuss. It also opens up the possibility that client code can add their own seeding routine, maybe reading from /dev/audio, or running a one-way hash digest on the gzipped output of top(1) or something or other.
This is probably a better tactic for getting patches sent in for additional seeding routines. Telling people to fill out and array and pass it to the seeds() routine seems a little cumbersome to me. And if the seeds() routine is called without any parameters, you should return the current seed array, so that people have the chance of serialising the state of the generator, in case they wish to "pick up where they left off" in a subsequent program invocation.
Oh, one last thing... you ought to add some statistical tests in your test suite to test that there aren't any glaring non-random errors occurring. Just saying things are between 0 and 1 is a little too simple.
One thing I really liked about the original MT implementation when I looked at it was that you could get a random number back from 0 .. MAXINT. Which is really nice, because then it becomes trivial to generate numbers on the [0,1] interval, rather than [0,1), which is sometimes nice to have. And if you're scaling up to integers anyway, you can save a few cycles by avoiding the initial reciprocal division that way. Being able to get at the raw U32 value would be nice.
- another intruder with the mooring in the heart of the Perl
| [reply] |
|
|
| [reply] |
Re: New CPAN Module: Math::Random::MT::Auto
by tlm (Prior) on Jun 23, 2005 at 17:46 UTC
|
Hi. I had a lot of fun reading the source for this module. Here is some constructive criticism, ranging from the major to the pathologically pedantic.
- PRNG code often uses one or several "magic numbers", like the 624 that appears frequently in the implementation of _seed. As shown in the code given in the previous point, it is both easier to read and to maintain if one puts these into file-scoped variables or constants or accessors. (My preference is to use lexicals with corresponding getters or getter/setters, depending on the desired API.)
- I am confused by this bit in import
elsif ($sym =~ 'auto') {
# To auto-seed or not
$auto_seed = $sym =~ /no|!/;
}
Don't you want $auto_seed to be set to $sym !~ /no|!/ ? If so, I would refactor that to something more like
elsif ($sym =~ /^(no|!)?auto$) {
$auto_seed = not defined $1;
}
...although I must admit that I don't really understand the difference between importing 'noauto' and simply leaving 'auto' out of the import list altogether. If there is no difference, then I'd get rid of the 'noauto' option and change the fragment above to
elsif ($sym eq 'auto') {
# To auto-seed or not
$auto_seed = 1; # default value for $auto_seed should be 0
}
- It might be nice to have access (e.g. through a getter) to the user agent object used when contacting remote sources over HTTP. This would be similar to what LWP::Simple provides, but using an accessor, instead of exporting the package variable.
- IMO the documentation could do a better job in explaining when someone may want to use this module over Perl's standard rand, or vice-versa.
-
I think one should relax strict for the smallest scope that requires it. Hence, in import I'd move the no strict 'refs' to
{
no strict 'refs';
*{"${pkg}::$sym"} = \&$sym;
}
- require doesn't import; hence
eval { require LWP::UserAgent; };
if ($@) {
push(@errors, "Failure loading LWP::UserAgent: $@");
next;
}
Actually, I prefer this idiom:
eval { require LWP::UserAgent }
or do {
push @errors,
"Failure loading LWP::UserAgent: $@";
next;
};
Update: In point 6, I changed a $sym =~ /^auto$/ to $sym eq 'auto'. What was I thinking?
| [reply] [d/l] [select] |
|
|
1. INIT blocks are broken in Perl 5. Use BEGIN blocks instead.
I need the INIT block to initialize the PRNG after the user's seed source
choices are known from the import subroutine. There's no way around it.
I added a cautionary note and sample code to work around this issue if the
user wants to require my module.
2. I understand the desire to make a drop-in replacement for Perl's rand
(and srand), but I think it is a huge missed opportunity not to make this PRNG
available as objects. PRNGs are a special kind of iterator; most, if not all,
of the benefits of iterator objects apply to PRNG objects. Specifically, the
possibility of having multiple independent PRNGs operating in parallel enables
much greater modularization than having a single global PRNG like rand. This
can be a real time saver, specially during testing and debugging.
It's an engineering decision. I opted for speed and simplicity.
Math::Random::MT has an OO interface for the purposes you describe. The only
significant feature it's missing is the ability to save and restore PRNG state.
UPDATE: I have now added a complete OO interface that is still faster than Math::Random::MT, and at the same time maintained the speed improvement on the standalone PRNG. Support for saving and restoring state vectors was also added.
3. I agree that statistical tests are essential for the this module's test
suite.
Statistical testing would flush out two possibilities: Either the algorithm
is bad, or the code is incorrect.
As I mentioned before, there seems to be adequate analysis of the algorithm
to show that it is fine.
As for the code, I compare the output of the PRNG for a known seed against
known output provided by the algorithm's authors. (The test uses 2000 random
number which exercises the shuffle routines through 4 cycles.) If that test
passes, then the code is definitely correct. (Such a deterministic test is, in
fact, much more stringent than a statistical one.)
4. I also agree with the opinion that _seed should be broken up. I would use
a dispatch table...
I have broken up this subroutine. I'll use a dispatch table as well.
5. PRNG code often uses one or several "magic numbers", like the 624 that
appears frequently in the implementation of _seed. As shown in the code given
in the previous point, it is both easier to read and to maintain if one puts
these into file-scoped variables or constants or accessors. (My preference is
to use lexicals with corresponding getters or getter/setters, depending on the
desired API.)
I put in a lexical variable.
6. I am confused by this bit in import
elsif ($sym =~ 'auto') {
# To auto-seed or not
$auto_seed = $sym =~ /no|!/;
}
Eeek! A bug! (See my tag line. ;) Thanks. I'll squash it.
7. It might be nice to have access (e.g. through a getter) to the user agent
object used when contacting remote sources over HTTP. This would be similar to
what LWP::Simple provides, but using an accessor, instead of exporting the
package variable.
As far as I can see, there does not seem to be any need for the user to have
access to the user agent object, and after a seed is acquired, there is
no need to keep it around. I guess I don't understand your point.
8. IMO the documentation could do a better job in explaining when someone
may want to use this module over Perl's standard rand, or vice-versa.
I added a bit of this to the POD.
9. I think one should relax strict for the smallest scope that requires it.
Hence, I'd move the no strict 'refs'<code> to
<code>
{ no strict 'refs'; *{"${pkg}::$sym"} = \&$sym; }
Done.
10. require doesn't import
eval { require LWP::UserAgent; };
if ($@) {
push(@errors, "Failure loading LWP::UserAgent: $@");
next;
}
I think you meant for me to add an import call. I will.
Actually, I prefer this idiom:
eval { require LWP::UserAgent }
or do {
push @errors,
"Failure loading LWP::UserAgent: $@";
next;
};
Not my style, but then again TMTOWTDI. :)
Thanks for the great feedback. I'll upload my changes to CPAN soon.
Remember: There's always one more bug.
| [reply] [d/l] [select] |
Re: New CPAN Module: Math::Random::MT::Auto
by jdhedden (Deacon) on Jul 07, 2005 at 16:37 UTC
|
Now here's something amazing. I just added 64-bit integer support to this module, and the speed went way up! I got a 66% speed improvement!
I don't think it has to do with the 64-bit version of the Mersenne Twister algorithm, as it runs through essentially the same code as the 32-bit version (just using different constants). In fact, the 64-bit version has to refresh the random number buffer twice as often, so it's actually doing more work.
One source of improvement may be better utilization of the CPU (Pentium 4) with everything getting done in 64-bit mode to match the CPU registers.
Another may be that since my Perl is compiled with 'use64bitint', it may be using the 64-bit ints from the PRNG directly, and not having to convert them from 32-bits (output of the old PRNG) to 64-bits (Perl's internal representation). (Not sure about this.)
Regardless, it was totally unexpected. Mayhaps this could be a lesson for other Perl XS hackers.
Remember: There's always one more bug.
| [reply] |