in reply to Re^7: PDL and srand puzzle - prior reply not using MCE
in thread PDL and srand puzzle

So I think what's happening is I recommended that srandom be called before creating any new threads at all, and you've ignored that and continue to have the same symptoms as before?

Replies are listed 'Best First'.
Re^9: PDL and srand puzzle - uniqueness PDL->random vs CORE::rand
by marioroy (Prior) on Jun 06, 2024 at 15:51 UTC

    I understood your recommendation to not call srandom inside threads. Basically, I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand(), regardless if calling srandom before spawning threads.

    Edit: etj identified a race condition, hence less uniqueness.

    use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; srandom(3); # PDL 2.089_01 for my $tid (1..16) { threads->create(sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } lock $lock; print $output; }); } $_->join for threads->list;

    CORE::rand()

    $ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000

    PDL->random

    $ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7507936 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785
      I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand()

      This indicates that random() generates fewer random bits than rand(). Right ??

      With perl, $Config{randbits} should report the number of random bits being generated by rand() - which is 48 on my SP-5.38.2.
      Is the comparable value for PDL's random() function readily available ?

      Cheers,
      Rob
        Can this even return anything else since 5.20? Consistent RNG.
        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        No. It may indicate PDL's random() is being seeded incorrectly.

        PDL's RNG is at https://github.com/PDLPorters/pdl/blob/master/Basic/Primitive/xoshiro256plus.c, and generates 64(!) bits of randomness, of which only 53 are used in an IEEE 754 double-precision number. See the comments in that file for limitations, including that the lowest 3 bits can fail linearity testing, which is not likely to be a problem. Note it generates n (a parameter) seeds, not necessarily just one, so independent POSIX threads will have different seeds if PDL's built-in POSIX threading is used to generate large amounts of random numbers.

        PDL's srandom function, if not given an input, uses PDL::Core::seed - see https://github.com/PDLPorters/pdl/pull/352.

        There is no interaction with Perl's srand() or rand() functionality.

        EDIT - I have just realised that Perl's "threads" functionality might clash with this, if it uses POSIX threads: PDL's RNG has a global vector of n seeds, in a global C variable. PDL's code will, if it is being used in what it thinks is single-threaded mode, use the 0th offset in that. If multiple POSIX threads are accessing that single seed (which gets updated on each RNG generation), there will be a race condition, hence less uniqueness. The workaround for that would be to generate the random numbers in the main thread first, possibly using PDL's multithreading if there are >1e6 elements in the ndarray, then use those suitably divided in the Perl threads.

        This indicates that random() generates fewer random bits than rand(). Right ??

        Possibly not the case, validated here running non-threads, nearly 1 billion unique.

        use v5.030; use Config; use PDL; use Math::MPFR qw(:mpfr); use Math::Random; use Math::Random::MT::Auto; say "# randbits ", $Config{randbits}; say; { say "# CORE::rand()"; CORE::srand(42); for (1..10) { my $r = CORE::rand(); Rmpfr_dump( Math::MPFR->new($r) ); } say; } { say "# PDL->random()"; PDL::srandom(42); # PDL::srand(42) using v2.062 ~ v2.089 for (1..10) { my $r = PDL->random(); Rmpfr_dump( Math::MPFR->new($r->at(0)) ); } say; } { say "# Math::Random::random_uniform()"; Math::Random::random_set_seed(42, 42); for (1..10) { my $r = Math::Random::random_uniform(); Rmpfr_dump( Math::MPFR->new($r) ); } say; } { say "# Math::Random::MT::Auto::rand()"; Math::Random::MT::Auto::set_seed(42); for (1..10) { my $r = Math::Random::MT::Auto::rand(); Rmpfr_dump( Math::MPFR->new($r) ); } say; }

        Output

        # randbits 48 # CORE::rand() 0.10111110100110010011000010111110010100010000000100000E0 0.10101111011101101001000101110110110001101111000000000E-1 0.11100011100000001010111000111001010100010001100000000E-3 0.11011000001111001100111111011000110001011110010000000E-1 0.10100110000111011001110100011100011010001010100000000E-3 0.11011011001111111011001010111111111011111111110000000E0 0.11111111011000101010001101001011001011001010111000000E-1 0.11110101001001110010010110001110010110100010110000000E-1 0.10110000110110010001010110010111111101100110100100000E0 0.11010101101001111110111111100010010000001100000000000E0 # PDL->random() 0.10101111101000001010000100101001111100011011001010100E-3 0.10011110111011100011111000010001111010000100001000111E-1 0.10000000001001001001010111101100010000010100010000110E-3 0.10011100111010000111111110011100100100101010011011100E-1 0.10000000100101010010000101001110111101111011110010101E-163 0.10011011001010110110000001100001001101101100010010111E-1 0.10110011011011001010101001110100111010100100100011001E-3 0.11101000011110111000100001001011101011010100100111010E-2 0.11011100111000000010100111101010011100110011111110001E0 0.10000101111100111111100010010000100100101010000110101E0 # Math::Random::random_uniform() 0.11111111111111110010000110000101111111110110110001010E0 0.11101110000010001101001101011000000100011101110001101E0 0.11000110111111000010100111101100001000110111001111100E-2 0.11111101010101011110101010101000001110110000110111100E0 0.10001101001110100011110110011001110010001010110011110E0 0.11001010101110011011001100000000100111110101000011101E0 0.11101010010000110110000001111111100100001011111000100E0 0.10001110011001110000110000110100100100000110111000100E0 0.10010010110111111100000011011111100010001001011000100E0 0.11011100010001101010110011010010010001101110111011010E-5 # Math::Random::MT::Auto::rand() 0.10000011111100001110111010101101001001100110101101010E0 0.10010000110000000011001110110001110111110111001111010E0 0.11101001010100101010111100101111111000000000110000100E0 0.11100101011001100001101111011111100011000101011101100E-1 0.11111011000000000011000100101100011111100101000000100E-1 0.10011111011111000111001111011010011100010111101000000E-2 0.10100100100000001110100110100110001110110010001100110E0 0.11101001110001010011111001011100111101111101000110010E0 0.11100100100101101001011110101000010101001011010111000E-1 0.10101100001101001001101101000011010011110011101000000E-2
        This indicates that random() generates fewer random bits than rand(). Right ??

        I don't think so. Looking at the produced values was a surprise.

        Took the threads loop and ran it with 16 pids and 1024 random() calls. Used "%.64f" "%.72f" as format for the generated numbers. In most cases, different runs produce identical results.

        <Update>
        For completeness, here is the script.

        #!/usr/bin/perl use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; PDL::srand(3); # PDL 2.087 for my $tid (1..16) { threads->create(sub { my $output = ""; for (1..1024) { my $r = random(); $output .= sprintf "%.72f\n", $r; } lock $lock; print $output; }); } $_->join for threads->list;
        </Update>

        When a run produces a different result, I see two scenarios:

        • One or two numbers are missing and are replaced by an already produced value. See examples 2 and 3 below.
        • The generated numbers are significantly different. See example 4.

        $ ./pdl-rand.pl | sort >sort.16384.1 $ ./pdl-rand.pl | sort >sort.16384.2 $ ./pdl-rand.pl | sort >sort.16384.3 $ ./pdl-rand.pl | sort >sort.16384.4 $ sort -u sort.16384.1 |wc -l 16384 $ sort -u sort.16384.2 |wc -l 16383 $ sort -u sort.16384.3 |wc -l 16382 $ sort -u sort.16384.4 |wc -l 16383 $ diff -U 1 sort.16384.1 sort.16384.2 --- sort.16384.1 2024-06-08 09:45:09.466739327 +0200 +++ sort.16384.2 2024-06-08 09:50:24.203356598 +0200 @@ -12562,2 +12562,3 @@ 0.7698795891537290048134423159353900700807571411132812500000000000000 +00000 +0.7698795891537290048134423159353900700807571411132812500000000000000 +00000 0.7699350058887169945265327442029956728219985961914062500000000000000 +00000 @@ -14417,3 +14418,2 @@ 0.8818249357757870221519169717794284224510192871093750000000000000000 +00000 -0.8818467386417550013533173114410601556301116943359375000000000000000 +00000 0.8818802323789649566521120505058206617832183837890625000000000000000 +00000 $ diff -U 1 sort.16384.1 sort.16384.3 --- sort.16384.1 2024-06-08 09:45:09.466739327 +0200 +++ sort.16384.3 2024-06-08 09:52:32.837243809 +0200 @@ -13113,2 +13113,3 @@ 0.8005079965106629558135864499490708112716674804687500000000000000000 +00000 +0.8005079965106629558135864499490708112716674804687500000000000000000 +00000 0.8007896602698270083209308722871355712413787841796875000000000000000 +00000 @@ -13686,3 +13687,2 @@ 0.8361851909041919661547126452205702662467956542968750000000000000000 +00000 -0.8361968170640460273901339860458392649888992309570312500000000000000 +00000 0.8364519442006479454931877626222558319568634033203125000000000000000 +00000 @@ -14417,3 +14417,2 @@ 0.8818249357757870221519169717794284224510192871093750000000000000000 +00000 -0.8818467386417550013533173114410601556301116943359375000000000000000 +00000 0.8818802323789649566521120505058206617832183837890625000000000000000 +00000 @@ -16095,2 +16094,3 @@ 0.9834468387403769717991508514387533068656921386718750000000000000000 +00000 +0.9834468387403769717991508514387533068656921386718750000000000000000 +00000 0.9837232694387190168328061190550215542316436767578125000000000000000 +00000 $ diff -U 1 sort.16384.1 sort.16384.4|head -12 --- sort.16384.1 2024-06-08 09:45:09.466739327 +0200 +++ sort.16384.4 2024-06-08 09:58:27.630459409 +0200 @@ -1,154 +1,161 @@ -0.0000798345195128779969639606917120033813262125477194786071777343750 +00000 +0.0000273717076871849010000561919220274376129964366555213928222656250 +00000 +0.0000550196742872266023331902229376311197484028525650501251220703125 +00000 0.0001113510803039119965475445273028753945254720747470855712890625000 +00000 -0.0001282989599132870110840404231922207145544234663248062133789062500 +00000 0.0002144808569095600127036443938166598854877520352602005004882812500 +00000 -0.0003265905868672370191731213484587215134524740278720855712890625000 +00000 0.0003745585362475710206227319520877472314168699085712432861328125000 +00000 -0.0004606498956905009760365299342765865731053054332733154296875000000 +00000 $ diff -U 0 sort.16384.1 sort.16384.4|grep -v '@@'|wc -l 19698

        I have no idea what's going on here but maybe someone else has.

        Update: Fixed manually modified file names in diff output.

        Greetings,
        🐻

        $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
      I'd have expected calling srandom inside new threads to behave better. Could you take a look at the srand code and see if anything is obviously wrong? Also, are you able to see if the duplicates are in groups i.e. sequences?

        See my reply to Rob. In short, calling PDL::srandom before spawning workers has no effect. The workaround is to call CORE::srand instead.

        See also, non-thread testing.

        Edit: MCE checks for PDL::Primitive->can('srand'), but missed checking PDL::Primitive->can('srandom'). Resolved in MCE v1.894 and MCE::Shared v1.889.