SleepyJay has asked for the wisdom of the Perl Monks concerning the following question:

I have this programming assignment due today (midnight) in my Graduate level Algorithms class. (*I'm NOT looking for advice on the assignment*). I've been able to choose my language all year and, of course, I choose Perl.

I'm producing thousands of largish random integers, by way of calling int(rand(10**12)) thousands of times, to use as test data (as per the assignment).

The assignment is testing various hueristics for the Number Partition problem; basically take a list of numbers and find the best way to break the list into two new lists, each having approximately the same sum (it's NP-Complete).

My problem is that I get a lot of answers for the difference between the two sums that are either 0, 1, (ok so far) or "related to" the value 30,517,579. When I say "related to" I mean that the values are like +/- 10 away from this value when I use the "good" algorithms or close to a multiple of this when I use the bad algorithms (+/- some multiple of 10).

I noticed that 30,517,579 / 10**12 ~= 2**15. Of course, 2**15 is a signed-2-byte-sized integer, which is very suspicious.

I'm running ActivePerl on WinXP, so I feel as if I should blame one or both of them (leaning towards WinXP).

Does anyone have any insight on this? And/Or can anyone point me in the direction of a random number generator that is: 1). As fast (or faster) as rand (i.e. TrulyRandom no good)
2). Can return 10**12-sized values (i.e. not Math::Random)
3). Is Perl and Windows compatible (i.e. no time to fiddle with C or Unix/Linux at this point)
4). Not likely to return foolish data like above

Good luck, right? I'm going to mention the fact that I've consulted experts about this in my write-up.

thanks.

Replies are listed 'Best First'.
Re: Rand Wackiness!
by BrowserUk (Patriarch) on May 11, 2006 at 21:07 UTC

    Perl's rand function is no good for producing values >32767 (on AS perl at least).

    See Math::Random::MT for a drop-in, rand replacement that should meet your needs. It'll be a little slower.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Just out of curiosity, do you know where/why this happens?

      My understanding of BigInt routines is limited, but it is literally doing something with 32768 (aka 2**15) that makes this happen?

      thanks
      jason.

        The rand function (on Win32) only produces values in the range 0 .. 32767*.

        If you multiply that by any factor, you will get multiples of the base values. E.g.

        #! perl -slw use strict; use Data::Dumper; my %hash; $hash{ int rand() * 65536 }++ for 1 .. 1e6; print "$_ => $hash{ $_ }" for sort{ $a <=> $b } keys %hash; __END__ 0 => 26 2 => 30 4 => 26 6 => 32 8 => 33 10 => 23 ... 65520 => 33 65522 => 28 65524 => 26 65526 => 24 65528 => 23 65530 => 26 65532 => 43 65534 => 26

        Note how there are no odd values produced. The multiplier is a twice the base, so all you get are multiples of two.

        If the multiplier is 3* the base, you only get powers of 3. And so on.

        For non-exact multipliers, the pattern is less obvious, but there will still be (lots) of values that will never be produced.

        Math::Random::MT has a much larger base, a huge periodicity, it's written in XS for reasonable speed and is available as a PPM from AS. Personally, I think it should have been included in Perl to replace CRT PRNGs years ago.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        To quote the documentation for the rand function:

        (Note: If your rand function consistently returns numbers that are too large or too small, then your version of Perl was probably compiled with the wrong number of RANDBITS.)

        emc

        "Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. "
        —G. Steele
Re: Rand Wackiness!
by ikegami (Patriarch) on May 11, 2006 at 22:15 UTC

    There's no new solution in this post. I just felt the thread could use some clarity. For starters, there are two problems at play here.

    log2(10**12) is slightly less than 40. That means your random number generator must be capable of producing 40 bit random numbers. According to BrowserUK, Active State's generator produces 15 bit numbers.

    It also means you need to be capable of receiving 40bit numbers from the random number generator. perl's floats have sufficient precision to hold a 40 bit number, so you'll have to find a random number generator that returns 40+ bit floats. Alternatively, you can increase the precison of your integers (by using Math::BigInt, for example) and using a random number generator capable of returning numbers in that format (like Zaxo's solution, for example).

      According to BrowserUK, Active State's generator produces 15 bit numbers.
      As you can verify for yourself, with
      use Config; print $Config{randbits};

        How annoying that must be: perl on my Debian systems reports 48 bits.

        print substr("Just another Perl hacker", 0, -2);
        - apotheon
        CopyWrite Chad Perrin

      I did go with Math::Random::MT and it seemed to work pretty well. Of course, I don't really have time to question it at this point. ;)

      thanks
      jason.

        Be reassured. The Mersenne Twister algorithm is very well documented and tested. And the Perl implementation is well tried and tested.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        Hmm. Something's been niggling me since my previous reply to this node.

        It has dawned on you that the MT is still only capable of producing 32-bits of output (on a 32-bit processor), and for the same reasons that scaling the standard rand to produce numbers > 32767 is flawed, scaling the output of MT to produce numbers > 2**32 is equally flawed?

        I just looked back at your OP and you say you were looking for random numbers in the range 10**12. That is roughly 2**44. Ie. > 2**32. Again, there will be a very limited range of values produced from your overall range requirements if you scale MT::rand() to that extent--2**32 / 2**44 * 100 ~= .025%.

        In order to ensure that all the numbers in your range are possible, it becomes neccessary to combine 2 calls to MT::rand() to get enough bits.

        If you generate a random integer in the range 0 .. 2**32-1 and multiply it by 2**32, then you have the potential to produce all the exact powers of 2**32 from 0 .. 2**63. If you then add a second random integer in the range 2**32-1, you have the potential to produce any integer in the range 2**64-1. You can then scale that value to produce integers in the range you require:

        ( r1 * 2**32 + r2 ) * MAX / 2**64 => 0 .. MAX

        The problem is that performing math greater than 2**53 bits in native Perl will introduce errors through IEEE floating point representation errors. You could move to using Math::BigInt, but then you pay the performance penalty.

        An alternative for numbers less than ~ 2**50, is to produce two random integers in the range 0 .. 2**(log2MAX) and combine them by the shifting & adding them as above. Then, when you scale the result, it should keep the math within the bounds of Perl's integer float precision and prevent (most) of the floating point representation errors.

        This bigRand() appears to produce results that are:

        1. within the correct range.
        2. evenly distributed within normal expectations of random numbers.

        The majority of my testing has (of necessity) been confined to using bigRand() to produce fairly small random integers (1e3 through 1e6). The run times and memory/diskspace required for adaquately testing of larger ranges has prohibited extensive testing of these. However, testing at the smaller ranges should be enough to confirm that the math is reasonable, and given the already extensive testing performed upon the MT, then if the math is correct, the results should scale appropriately.

        Please, if the randomness of the output are very important in your work, perform your own testing, to your own satisfaction, before considering this code.

        use Math::Random::MT qw[ srand rand ]; use POSIX qw[ frexp ]; sub bigRand { my $max = shift; my $exp = scalar frexp( $max ); ++$exp if $exp &1; ## Update: Didn't work if exponent +was odd. my $halfbits = $exp / 2; my @rand = map{ rand 2**$halfbits } 1 .. 2; my $rand = ( $rand[ 0 ] * 2**$halfbits ) + $rand[ 1 ]; ## Updated: Use * not << for floats! my $rv = int( $rand * ( $max / 2**$exp ) ); ## Update: Less roundo +ff }

        Does anyone have any suggestions for how to go about testing this for the higher ranges? There are essentially two problems.

        The first is simply the runtime required to produce enough random values in the range (say) 0 .. 10e12. I think it would require at least 10e14 or even 10e15 values to even begin to allow statistical verification. And it's not something that you could do as a distributed process either I think?

        The second is how to accumulate 10e12 counts?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Rand Wackiness!
by jdhedden (Deacon) on May 11, 2006 at 21:02 UTC
    You can get random ints directly using Math::Random::MT::Auto, and you can limit their range using the included Math::Random::MT::Auto::Range class. This module is based on the Mersenne Twister PRNG, and should not exhibit any of the anomalies you're noting with Perl's build-int rand().

    Remember: There's always one more bug.
Re: Rand Wackiness!
by Zaxo (Archbishop) on May 11, 2006 at 21:54 UTC

    Here's the flat boneheaded way; get them as strings,

    $ perl -e'for (1..10) { printf "%04d", int rand 10000 for 1..3; print +$/ }' 514418728718 307337438051 439351593615 060710058137 425610029405 005219485294 477480663085 055060660839 412479203930 462907707151 $
    You can use sprintf and concatenation if you want them in memory. That's not entirely dumb; you can initialize Math::BigInts with them.

    Test these, too. PRNG's often show periodicity in the low bits, and left-shifting them is asking for trouble.

    After Compline,
    Zaxo

Re: Rand Wackiness!
by ptum (Priest) on May 11, 2006 at 20:45 UTC
Re: Rand Wackiness!
by GrandFather (Saint) on May 11, 2006 at 21:03 UTC

    Super Search using "random number" as the title search string and check the SoPW check box only. Put "Re: Re^" in the skip titles field. This sort of question has cropped up many many times in the past.


    DWIM is Perl's answer to Gödel
Re: Rand Wackiness!
by gam3 (Curate) on May 11, 2006 at 20:56 UTC
    You might just try this:
    perl -e 'print int(rand() * 10**15)
    This will solve the problem if it is in the rand() function.
    -- gam3
    A picture is worth a thousand words, but takes 200K.