in reply to Re^5: PDL and srand puzzle - support added to MCE v1.891
in thread PDL and srand puzzle

It's possible this is because srandom's dealing with the number of CPUs available is interacting with the way MCE (or at least this use of it) does multi-threading. PDL->random is supposed to have its srandom done in a "main" POSIX thread, and then random is called any number of times in each POSIX thread, using widely separated bits of "randomness" stored centrally for each thread. Therefore, it is highly recommended that you call the srandom in the main thread before you create any sort of thread, so they won't all call srandom with extremely similar starting conditions, and therefore getting results similar to what you're seeing.

Replies are listed 'Best First'.
Re^7: PDL and srand puzzle - prior reply not using MCE
by marioroy (Prior) on Jun 06, 2024 at 05:24 UTC
    It's possible this is because srandom's dealing with the number of CPUs available is interacting with the way MCE (or at least this use of it) does multi-threading.

    The thread example given in my prior reply is not using MCE.

      I'm taking CORE::rand() and PDL::random() for a spin without threads. Rather, child processes. There are 8 workers, each output 50,000 lines. A count below 400,000 indicates duplicates in the output.

      use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 8, user_func => sub { for (1..50000) { # my $r = CORE::rand(); my $r = PDL->random; MCE->say("$r"); } } )->run;

      CORE::rand()

      $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

      PDL->random

      $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000 $ perl test4.pl | LC_ALL=C sort -u | wc -l 400000

      Next, I tried 12 million unique lines and tight loop by appending to a string (i.e. no waiting for serialized output previously). Again, no duplicates.

      use v5.030; use PDL; use MCE 1.894; MCE->new( max_workers => 24, user_func => sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } MCE->print($output); } )->run;

      CORE::rand() and PDL->random

      $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000 $ perl test5.pl | LC_ALL=C sort -u | wc -l 12000000

      Sorting takes a while. There is the mcesort program with integrated mini-MCE. Copy the script to /usr/local/bin and sudo chmod +x /usr/local/bin/mcesort or bin path of your choice.

      perl test5.pl | LC_ALL=C mcesort -j6 -u | wc -l

        The following uses threads for comparison. Locking is required to not garble output, handled automatically i.e. MCE->say, MCE->print, MCE->printf. Here, a count below 32 million indicates duplicate lines in the output.

        Edit: etj identified a race condition, hence less uniqueness.

        use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; for my $tid (1..64) { threads->create(sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } lock $lock; print $output; }); } $_->join for threads->list;

        CORE::rand()

        $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000 $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000 $ perl test6.pl | LC_ALL=C sort -u | wc -l 32000000

        PDL->random

        $ perl test6.pl | LC_ALL=C sort -u | wc -l 25105304 $ perl test6.pl | LC_ALL=C sort -u | wc -l 25304231 $ perl test6.pl | LC_ALL=C sort -u | wc -l 25290392

        Improving sort

        Mentioned in my prior post, the parallel mcesort program with integrated mini-MCE resides in a GitHub Gist. Copy the script to /usr/local/bin and sudo chmod +x /usr/local/bin/mcesort or bin path of your choice.

        perl test6.pl | LC_ALL=C mcesort -j6 -u | wc -l
      So I think what's happening is I recommended that srandom be called before creating any new threads at all, and you've ignored that and continue to have the same symptoms as before?

        I understood your recommendation to not call srandom inside threads. Basically, I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand(), regardless if calling srandom before spawning threads.

        Edit: etj identified a race condition, hence less uniqueness.

        use v5.030; use threads; use threads::shared; use PDL; BEGIN { $PDL::no_clone_skip_warning = 1; } my $lock : shared = 0; srandom(3); # PDL 2.089_01 for my $tid (1..16) { threads->create(sub { my $output = ""; for (1..500000) { # my $r = CORE::rand(); my $r = PDL->random; $output .= "$r\n"; } lock $lock; print $output; }); } $_->join for threads->list;

        CORE::rand()

        $ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 8000000

        PDL->random

        $ perl test7.pl | wc -l 8000000 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7507936 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785 $ perl test7.pl | LC_ALL=C sort -u | wc -l 7446785