in reply to Re^4: PDL and srand puzzle
in thread PDL and srand puzzle

Hi, etj

Calling PDL->random is less unique among threads versus rand().

Edit: etj identified a race condition, hence less uniqueness.

use v5.030; use threads; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } for my $id (1..4) { threads->create(sub { for (1..8000) { # my $r = CORE::rand(); my $r = PDL->random; say $r; } }); } $_->join for threads->list;

Thread CORE::rand()

$ perl test.pl | LC_ALL=C sort | uniq -c | sort -n | tail 1 0.9999196288117 1 0.999939390107826 1 1.52499260792638e-05 1 5.77532994405772e-05 1 6.39378495463916e-05 1 6.94432865593342e-05 1 7.17426768090945e-05 1 8.48674303988162e-05 1 9.10499814921195e-05 1 9.32151475616649e-05

Thread PDL->random

$ perl test.pl | LC_ALL=C sort | uniq -c | sort -n | tail 3 0.578372644755136 3 0.587725272162442 3 0.629117050685529 3 0.665209931957569 3 0.666012741533792 3 0.715907985301874 3 0.780440518879262 3 0.789289520984441 3 0.859969220981904 3 0.975280763021946

Replies are listed 'Best First'.
Re^6: PDL and srand puzzle
by etj (Priest) on Jun 06, 2024 at 03:55 UTC
    It's possible this is because srandom's dealing with the number of CPUs available is interacting with the way MCE (or at least this use of it) does multi-threading. PDL->random is supposed to have its srandom done in a "main" POSIX thread, and then random is called any number of times in each POSIX thread, using widely separated bits of "randomness" stored centrally for each thread. Therefore, it is highly recommended that you call the srandom in the main thread before you create any sort of thread, so they won't all call srandom with extremely similar starting conditions, and therefore getting results similar to what you're seeing.
      It's possible this is because srandom's dealing with the number of CPUs available is interacting with the way MCE (or at least this use of it) does multi-threading.

      The thread example given in my prior reply is not using MCE.

        I'm taking CORE::rand() and PDL::random() for a spin without threads. Rather, child processes. There are 8 workers, each output 50,000 lines. A count below 400,000 indicates duplicates in the output.

        use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 8, user_func => sub { for (1..50000) { # my $r = CORE::rand(); my $r = PDL->random; MCE->say("$r"); } } )->run;

        CORE::rand()

        $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

        PDL->random

        $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

        Next, I tried 12 million unique lines and tight loop by appending to a string (i.e. no waiting for serialized output previously). Again, no duplicates.

        use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 24, user_func => sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } MCE->print($output); } )->run;

        CORE::rand() and PDL->random

        $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000

        Sorting takes a while. There is the mcesort program with integrated mini-MCE. Copy the script to /usr/local/bin and sudo chmod +x /usr/local/bin/mcesort or bin path of your choice.

        perl test5.pl | LC_ALL=C mcesort -j6 -u | wc -l
        So I think what's happening is I recommended that srandom be called before creating any new threads at all, and you've ignored that and continue to have the same symptoms as before?