Re^9: PDL and srand puzzle - uniqueness PDL->random vs CORE::rand
by marioroy (Prior) on Jun 06, 2024 at 15:51 UTC
|
I understood your recommendation to not call srandom inside threads. Basically, I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand(), regardless if calling srandom before spawning threads.
Edit: etj identified a race condition, hence less uniqueness.
use v5.030;
use threads;
use threads::shared;
use PDL;
BEGIN { $PDL::no_clone_skip_warning = 1; }
my $lock : shared = 0;
srandom(3); # PDL 2.089_01
for my $tid (1..16) {
threads->create(sub {
my $output = "";
for (1..500000) {
# my $r = CORE::rand();
my $r = PDL->random;
$output .= "$r\n";
}
lock $lock;
print $output;
});
}
$_->join for threads->list;
CORE::rand()
$ perl test7.pl | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
8000000
PDL->random
$ perl test7.pl | wc -l
8000000
$ perl test7.pl | LC_ALL=C sort -u | wc -l
7507936
$ perl test7.pl | LC_ALL=C sort -u | wc -l
7446785
$ perl test7.pl | LC_ALL=C sort -u | wc -l
7446785
| [reply] [d/l] [select] |
|
I'm reporting that PDL::random() results in lesser uniqueness versus CORE::rand()
This indicates that random() generates fewer random bits than rand(). Right ??
With perl, $Config{randbits} should report the number of random bits being generated by rand() - which is 48 on my SP-5.38.2.
Is the comparable value for PDL's random() function readily available ?
Cheers, Rob
| [reply] |
|
Can this even return anything else since 5.20? Consistent RNG.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
|
No. It may indicate PDL's random() is being seeded incorrectly.
PDL's RNG is at https://github.com/PDLPorters/pdl/blob/master/Basic/Primitive/xoshiro256plus.c, and generates 64(!) bits of randomness, of which only 53 are used in an IEEE 754 double-precision number. See the comments in that file for limitations, including that the lowest 3 bits can fail linearity testing, which is not likely to be a problem. Note it generates n (a parameter) seeds, not necessarily just one, so independent POSIX threads will have different seeds if PDL's built-in POSIX threading is used to generate large amounts of random numbers.
PDL's srandom function, if not given an input, uses PDL::Core::seed - see https://github.com/PDLPorters/pdl/pull/352.
There is no interaction with Perl's srand() or rand() functionality.
EDIT - I have just realised that Perl's "threads" functionality might clash with this, if it uses POSIX threads: PDL's RNG has a global vector of n seeds, in a global C variable. PDL's code will, if it is being used in what it thinks is single-threaded mode, use the 0th offset in that. If multiple POSIX threads are accessing that single seed (which gets updated on each RNG generation), there will be a race condition, hence less uniqueness. The workaround for that would be to generate the random numbers in the main thread first, possibly using PDL's multithreading if there are >1e6 elements in the ndarray, then use those suitably divided in the Perl threads.
| [reply] [d/l] [select] |
|
|
|
|
use v5.030;
use Config;
use PDL;
use Math::MPFR qw(:mpfr);
use Math::Random;
use Math::Random::MT::Auto;
say "# randbits ", $Config{randbits};
say;
{
say "# CORE::rand()";
CORE::srand(42);
for (1..10) {
my $r = CORE::rand();
Rmpfr_dump( Math::MPFR->new($r) );
}
say;
}
{
say "# PDL->random()";
PDL::srandom(42); # PDL::srand(42) using v2.062 ~ v2.089
for (1..10) {
my $r = PDL->random();
Rmpfr_dump( Math::MPFR->new($r->at(0)) );
}
say;
}
{
say "# Math::Random::random_uniform()";
Math::Random::random_set_seed(42, 42);
for (1..10) {
my $r = Math::Random::random_uniform();
Rmpfr_dump( Math::MPFR->new($r) );
}
say;
}
{
say "# Math::Random::MT::Auto::rand()";
Math::Random::MT::Auto::set_seed(42);
for (1..10) {
my $r = Math::Random::MT::Auto::rand();
Rmpfr_dump( Math::MPFR->new($r) );
}
say;
}
Output
# randbits 48
# CORE::rand()
0.10111110100110010011000010111110010100010000000100000E0
0.10101111011101101001000101110110110001101111000000000E-1
0.11100011100000001010111000111001010100010001100000000E-3
0.11011000001111001100111111011000110001011110010000000E-1
0.10100110000111011001110100011100011010001010100000000E-3
0.11011011001111111011001010111111111011111111110000000E0
0.11111111011000101010001101001011001011001010111000000E-1
0.11110101001001110010010110001110010110100010110000000E-1
0.10110000110110010001010110010111111101100110100100000E0
0.11010101101001111110111111100010010000001100000000000E0
# PDL->random()
0.10101111101000001010000100101001111100011011001010100E-3
0.10011110111011100011111000010001111010000100001000111E-1
0.10000000001001001001010111101100010000010100010000110E-3
0.10011100111010000111111110011100100100101010011011100E-1
0.10000000100101010010000101001110111101111011110010101E-163
0.10011011001010110110000001100001001101101100010010111E-1
0.10110011011011001010101001110100111010100100100011001E-3
0.11101000011110111000100001001011101011010100100111010E-2
0.11011100111000000010100111101010011100110011111110001E0
0.10000101111100111111100010010000100100101010000110101E0
# Math::Random::random_uniform()
0.11111111111111110010000110000101111111110110110001010E0
0.11101110000010001101001101011000000100011101110001101E0
0.11000110111111000010100111101100001000110111001111100E-2
0.11111101010101011110101010101000001110110000110111100E0
0.10001101001110100011110110011001110010001010110011110E0
0.11001010101110011011001100000000100111110101000011101E0
0.11101010010000110110000001111111100100001011111000100E0
0.10001110011001110000110000110100100100000110111000100E0
0.10010010110111111100000011011111100010001001011000100E0
0.11011100010001101010110011010010010001101110111011010E-5
# Math::Random::MT::Auto::rand()
0.10000011111100001110111010101101001001100110101101010E0
0.10010000110000000011001110110001110111110111001111010E0
0.11101001010100101010111100101111111000000000110000100E0
0.11100101011001100001101111011111100011000101011101100E-1
0.11111011000000000011000100101100011111100101000000100E-1
0.10011111011111000111001111011010011100010111101000000E-2
0.10100100100000001110100110100110001110110010001100110E0
0.11101001110001010011111001011100111101111101000110010E0
0.11100100100101101001011110101000010101001011010111000E-1
0.10101100001101001001101101000011010011110011101000000E-2
| [reply] [d/l] [select] |
|
|
|
|
|
|
This indicates that random() generates fewer random bits than rand(). Right ??
I don't think so. Looking at the produced values was a surprise.
Took the threads loop and ran it with 16 pids and 1024 random() calls.
Used "%.64f" "%.72f" as format for the generated numbers.
In most cases, different runs produce identical results.
<Update>
For completeness, here is the script.
#!/usr/bin/perl
use v5.030;
use threads;
use threads::shared;
use PDL;
BEGIN { $PDL::no_clone_skip_warning = 1; }
my $lock : shared = 0;
PDL::srand(3); # PDL 2.087
for my $tid (1..16) {
threads->create(sub {
my $output = "";
for (1..1024) {
my $r = random();
$output .= sprintf "%.72f\n", $r;
}
lock $lock;
print $output;
});
}
$_->join for threads->list;
</Update>
When a run produces a different result, I see two scenarios:
- One or two numbers are missing and are replaced by an already produced value. See examples 2 and 3 below.
- The generated numbers are significantly different. See example 4.
$ ./pdl-rand.pl | sort >sort.16384.1
$ ./pdl-rand.pl | sort >sort.16384.2
$ ./pdl-rand.pl | sort >sort.16384.3
$ ./pdl-rand.pl | sort >sort.16384.4
$ sort -u sort.16384.1 |wc -l
16384
$ sort -u sort.16384.2 |wc -l
16383
$ sort -u sort.16384.3 |wc -l
16382
$ sort -u sort.16384.4 |wc -l
16383
$ diff -U 1 sort.16384.1 sort.16384.2
--- sort.16384.1 2024-06-08 09:45:09.466739327 +0200
+++ sort.16384.2 2024-06-08 09:50:24.203356598 +0200
@@ -12562,2 +12562,3 @@
0.7698795891537290048134423159353900700807571411132812500000000000000
+00000
+0.7698795891537290048134423159353900700807571411132812500000000000000
+00000
0.7699350058887169945265327442029956728219985961914062500000000000000
+00000
@@ -14417,3 +14418,2 @@
0.8818249357757870221519169717794284224510192871093750000000000000000
+00000
-0.8818467386417550013533173114410601556301116943359375000000000000000
+00000
0.8818802323789649566521120505058206617832183837890625000000000000000
+00000
$ diff -U 1 sort.16384.1 sort.16384.3
--- sort.16384.1 2024-06-08 09:45:09.466739327 +0200
+++ sort.16384.3 2024-06-08 09:52:32.837243809 +0200
@@ -13113,2 +13113,3 @@
0.8005079965106629558135864499490708112716674804687500000000000000000
+00000
+0.8005079965106629558135864499490708112716674804687500000000000000000
+00000
0.8007896602698270083209308722871355712413787841796875000000000000000
+00000
@@ -13686,3 +13687,2 @@
0.8361851909041919661547126452205702662467956542968750000000000000000
+00000
-0.8361968170640460273901339860458392649888992309570312500000000000000
+00000
0.8364519442006479454931877626222558319568634033203125000000000000000
+00000
@@ -14417,3 +14417,2 @@
0.8818249357757870221519169717794284224510192871093750000000000000000
+00000
-0.8818467386417550013533173114410601556301116943359375000000000000000
+00000
0.8818802323789649566521120505058206617832183837890625000000000000000
+00000
@@ -16095,2 +16094,3 @@
0.9834468387403769717991508514387533068656921386718750000000000000000
+00000
+0.9834468387403769717991508514387533068656921386718750000000000000000
+00000
0.9837232694387190168328061190550215542316436767578125000000000000000
+00000
$ diff -U 1 sort.16384.1 sort.16384.4|head -12
--- sort.16384.1 2024-06-08 09:45:09.466739327 +0200
+++ sort.16384.4 2024-06-08 09:58:27.630459409 +0200
@@ -1,154 +1,161 @@
-0.0000798345195128779969639606917120033813262125477194786071777343750
+00000
+0.0000273717076871849010000561919220274376129964366555213928222656250
+00000
+0.0000550196742872266023331902229376311197484028525650501251220703125
+00000
0.0001113510803039119965475445273028753945254720747470855712890625000
+00000
-0.0001282989599132870110840404231922207145544234663248062133789062500
+00000
0.0002144808569095600127036443938166598854877520352602005004882812500
+00000
-0.0003265905868672370191731213484587215134524740278720855712890625000
+00000
0.0003745585362475710206227319520877472314168699085712432861328125000
+00000
-0.0004606498956905009760365299342765865731053054332733154296875000000
+00000
$ diff -U 0 sort.16384.1 sort.16384.4|grep -v '@@'|wc -l
19698
I have no idea what's going on here but maybe someone else has.
Update: Fixed manually modified file names in diff output.
Greetings, 🐻
$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
| [reply] [d/l] [select] |
|
|
|
|
I'd have expected calling srandom inside new threads to behave better. Could you take a look at the srand code and see if anything is obviously wrong? Also, are you able to see if the duplicates are in groups i.e. sequences?
| [reply] |
|
See my reply to Rob. In short, calling PDL::srandom before spawning workers has no effect. The workaround is to call CORE::srand instead.
See also, non-thread testing.
Edit: MCE checks for PDL::Primitive->can('srand'), but missed checking PDL::Primitive->can('srandom'). Resolved in MCE v1.894 and MCE::Shared v1.889.
| [reply] [d/l] [select] |
|