WouterVG has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I am currently trying to shuffle the codons of a DNA sequence. The sequence needs to be split in groups of 3 characters and those groups need to be shuffled. So far I was able to split the sequence in groups of 3. However, I do not succeed at shuffling them randomly. I am uncapable of installing the list utils... I do not seem to find a good tutorial for macOS.... And I am unable to correctly introduce the Fisher Yates algorithm into my source code... Another noob has entered the monastery... My code so far :

print "enter sequence and signal end with enter followed by ctrl d\n" +; $sequence = <STDIN>; chomp $sequence; print "sequence inserted : $sequence\n"; @trips = unpack("a3" x (length($sequence)-2), $sequence); @trips = join(" ", @trips);

so I am looking to shuffle @trips and then store it into @shuftrips for example.

Kind regards,

Wouter

Replies are listed 'Best First'.
Re: Shuffling CODONS
by BrowserUk (Patriarch) on Jun 07, 2018 at 13:51 UTC

    Here's a proper (tested), efficient, in-place, pure Perl, Fisher Yates shuffle:

    sub shuffleAry { die 'Need array reference' unless ref( $_[0] ) eq 'ARRAY'; our( @aliased, $a, $b ); local( *aliased, $a, $b ) = $_[0]; $a = $_ + rand @aliased - $_, $b = $aliased[ $_ ], $aliased[ $_ ] = $aliased[ $a ], $aliased[ $a ] = $b for 0 .. $#aliased; return; } my @array = 1..100; shuffleAry( \@array ); print @array;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Shuffling codons
by hippo (Archbishop) on Jun 07, 2018 at 13:29 UTC

    It's an FAQ!

    I am uncapable of installing the list utils

    List::Util is in core and has been since 5.7.3 so you probably don't even need to install it. Either way, I'd tackle this problem (module installation) if I were in your shoes as fixing it will solve many more problems that just the one in your post.

    In what way are you incapable of installing it?

      Thanks for your reply. I know it is an FAQ, That's where I got the Fisher Yates option from, But unable to correctly implement it in my code..

      If I try the list util it doesnt work... My code looks like this

      print "enter sequence and signal end with enter followed by ctrl d\n"; $sequence = <STDIN>; chomp $sequence; print "sequence inserted : $sequence\n"; @trips = unpack("a3" x (length($sequence)-2), $sequence); @trips = join(" ", @trips); use List::Util 'shuffle'; @shuffled = shuffle(@trips); print "@shuffled\n";

      What is wrong/missing here?

      I am incapable of installing the list::utils, I wouldn't know where to unpack them and how to properly install and run them. Do they need to be situated somewhere. I am on perl v5.18.2, so the list utils should already be there you mention?

      How would I implement the fisher Yates algorithm correctly?

        perl v5.18.2 came after perl v5.7.3, so yes, it should be installed for you. However, the name is List::Util, not list::utils. The following shows that list::utils doesn't exist, but that List::Util exists and has been in core since v5.7.3:

        pryrt@debianvm:~$ perl -Mlist::utils -le 'print "OK"' Can't locate list/utils.pm in @INC (you may need to install the list:: +utils module) (@INC contains: ./blib/lib /home/pryrt/perl5/perlbrew/p +erls/perl-5.20.3/lib/site_perl/5.20.3/i686-linux /home/pryrt/perl5/pe +rlbrew/perls/perl-5.20.3/lib/site_perl/5.20.3 /home/pryrt/perl5/perlb +rew/perls/perl-5.20.3/lib/5.20.3/i686-linux /home/pryrt/perl5/perlbre +w/perls/perl-5.20.3/lib/5.20.3 .). BEGIN failed--compilation aborted. pryrt@debianvm:~$ perl -MList::Util -le 'print "OK"' OK pryrt@debianvm:~$ corelist List::Util Data for 2015-09-12 List::Util was first released with perl v5.7.3

        ... Besides, if your example code, which included use List::Util 'shuffle'; compiled at all, then you already knew you had it installed properly.

        Oh, there, that's what you're doing wrong: @trips = join(" ", @trips);. You just replaced the contents of the @trips array with a single value, which is a string which joins all the old elements of @trips with a space. I think what you want is more akin to:

        #!/usr/bin/env perl use warnings; use strict; print "enter sequence and signal end with enter followed by ctrl d\n"; my $sequence = <STDIN>; chomp $sequence; print "sequence inserted : $sequence\n"; my @trips = unpack("a3" x (length($sequence)-2), $sequence); local $" = ", "; print "unshuffled: (@trips)\n"; use List::Util 'shuffle'; my @shuffled = shuffle(@trips); print "shuffled: (@shuffled)\n"; __END__
        __RESULTS__ enter sequence and signal end with enter followed by ctrl d GATTACCAT sequence inserted : GATTACCAT unshuffled: (GAT, TAC, CAT, , , , ) shuffled: (, , GAT, TAC, CAT, , )
        If I try the list util it doesnt work

        Not enough information there. Error message? Segfault? Compilation error?

        What is wrong/missing here?

        This line:

        @trips = join(" ", @trips);

        ruins it. What do you understand this line to be doing? Without it I get at least some shuffling which may or may not be what you want:

        #!/usr/bin/env perl use strict; use warnings; use List::Util 'shuffle'; print "enter sequence and signal end with enter followed by ctrl d\n"; my $sequence = <STDIN>; chomp $sequence; print "sequence inserted : $sequence\n"; my @trips = unpack("a3" x (length($sequence)-2), $sequence); my @shuffled = shuffle(@trips); print "@shuffled\n";

        which gives:

        $ perl /tmp/shuf.pl enter sequence and signal end with enter followed by ctrl d aaabbbcccddd sequence inserted : aaabbbcccddd ccc aaa ddd bbb
        I am on perl v5.18.2, so the list utils should already be there you mention?

        Yes.

Re: Shuffling CODONS
by tybalt89 (Monsignor) on Jun 07, 2018 at 13:39 UTC

    My favorite shuffle

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1216102 use strict; use warnings; my @trips = 'a' .. 'z'; my @shuftrips = myshuffle( @trips ); print "@shuftrips\n"; sub myshuffle { map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, rand ], @_; }

      Why is that a favorite? I get it'll shuffle anything correctly, but trading O(NlogN) for O(1) is painful:

      Benchmark:

      Rate tybalt buk1 buk2 ListUtil tybalt 224/s -- -65% -70% -97% buk1 644/s 187% -- -14% -91% buk2 746/s 233% 16% -- -90% ListUtil 7342/s 3176% 1040% 884% -- C:\test>shufflesBenchmark2018.pl -SIZE=1e3 Rate tybalt buk1 buk2 ListUtil tybalt 229/s -- -66% -70% -97% buk1 675/s 195% -- -13% -91% buk2 776/s 238% 15% -- -89% ListUtil 7339/s 3102% 987% 846% -- C:\test>shufflesBenchmark2018.pl -SIZE=1e4 Rate tybalt buk1 buk2 ListUtil tybalt 17.2/s -- -75% -78% -98% buk1 67.5/s 292% -- -13% -90% buk2 77.2/s 349% 14% -- -89% ListUtil 704/s 3992% 944% 812% -- C:\test>shufflesBenchmark2018.pl -SIZE=1e5 (warning: too few iterations for a reliable count) Rate tybalt buk1 buk2 ListUtil tybalt 1.21/s -- -81% -84% -98% buk1 6.40/s 430% -- -15% -89% buk2 7.53/s 523% 18% -- -87% ListUtil 56.4/s 4566% 781% 649% --

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

      Thanks for your reply! If I implement it like below. It doesn't change the order of codons.

      print "enter sequence and signal end with enter followed by ctrl d\n" +; $sequence = <STDIN>; chomp $sequence; print "sequence inserted : $sequence\n"; @trips = unpack("a3" x (length($sequence)-2), $sequence); @trips = join(" ", @trips); my @shuftrips = &myshuffle( @trips ); print "@shuftrips\n"; sub myshuffle { map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, rand ], @_; }

        Because @trips = join(" ", @trips); joins all codons into ONE string!

Re: Shuffling CODONS
by thanos1983 (Parson) on Jun 07, 2018 at 13:39 UTC

    Hello WouterVG,

    Welcome to the Monastery. You can search the forum it contains a lot of information as your question was asked before e.g. How do I shuffle an array?

    Something like that should do what you want, assuming you want to shuffle array element (3 characters).

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use List::Util qw(shuffle); my $sequence = "CTGCAC"; # chomp(my $sequence = <STDIN>); my @data; push @data, $1 while ($sequence =~ /(.{1,3})/msxog); print Dumper \@data; my @random = shuffle @data; print Dumper \@random; __END__ $ perl test.pl $VAR1 = [ 'CTG', 'CAC' ]; $VAR1 = [ 'CAC', 'CTG' ];

    Update: In case you want to use unpack see bellow:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use List::Util qw(shuffle); my $sequence = "CTGCAC"; # chomp($sequence = <STDIN>); my @trips = unpack("(A3)*", $sequence); print Dumper \@trips; my @shuffled = shuffle @trips ; print Dumper \@shuffled; __END__ perl test.pl $VAR1 = [ 'CTG', 'CAC' ]; $VAR1 = [ 'CAC', 'CTG' ];

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Shuffling CODONS
by bliako (Abbot) on Jun 08, 2018 at 12:44 UTC

    Apart from time benchmarks, the suitability of the shuffle algorithm must be assessed with respect to the quality of the randomness of the shuffled array. One way to do this is to calculate the auto-correlation of the shuffled sequence with lag 1 (looking at consecutive elements). The absolute value of the a-c coefficient approaches 1 when the sequence is highly auto-correlated (for example the test array 1..1000) and zero when the opposite happens. So, a good quality shuffle should produce auto-correlations approaching zero.

    Edit: suggested test scenario: start with a highly correlated array (e.g 1..1000: perl -MStatistics::Autocorrelation -e 'print Statistics::Autocorrelation->new()->coefficient(data=>[1..1000],lag=>1)."\n"' yields 0.997) and see how the shuffling algorithm de-auto-correlates it by lowering its auto-correlation coefficient towards zero.

    Edit 2: auto-correlation coefficient is in the range -1 to 1. Both extremes are for higlhy auto-correlated sequences and zero for no auto-correlation. In this test I take the absolute value of the coefficient.

    The following script compares the three methods mentioned here by BrowserUK, tybalt89, List::Util/shuffle with respect to auto-correlation and also, for each trial it plots a histogram of the differences between consecutive elements of the shuffled array, just for fun.

    The best shuffle is the one who produces the lowest mean auto-correlation with lowest variance and most successes (i.e. it had the minimum auto-correlation at a specific trial).

    ./fisher_yates.pl : after 5000 trials shuffling arrays of size 1000: List::Util::shuffle : 1693 successes, mean:0.0105896962736892, stdev:0 +.00900688731621982 BUK : 1685 successes, mean:0.010799062825769, stdev:0.0092140346941260 +4 tybalt89 : 1622 successes, mean:0.0102906705829024, stdev:0.0084376063 +2828801

    once more:

    ./fisher_yates.pl : after 5000 trials shuffling arrays of size 1000: BUK : 1696 successes, mean:0.0104235933728858, stdev:0.008974970557612 +36 List::Util::shuffle : 1690 successes, mean:0.0106133000677379, stdev:0 +.00908235156157047 tybalt89 : 1614 successes, mean:0.0100835174626996, stdev:0.0089795531 +9759652

    once more:

    ./fisher_yates.pl : after 5000 trials shuffling arrays of size 1000: List::Util::shuffle : 1690 successes, mean:0.0104611128054915, stdev:0 +.00886345338184372 BUK : 1658 successes, mean:0.0102429744950854, stdev:0.008480381381372 +49 tybalt89 : 1652 successes, mean:0.0105683142305418, stdev:0.0089906156 +3593633

    My opinion: all algorithms work well with respect to randomness (as assessed by auto-correlation) and now we can move to time benchmarks.

    TODO: try with a different random number generator (i.e. more reliably uniform).

    The test program:

      TODO: try with a different random number generator (i.e. more reliably uniform)

      Unless the size of the array is approaching the period length of the PRNG, then the quality of the PRNG has little or no effect upon the quality of the shuffling.

      Neither the quality of any given shuffle; nor the quality of successive shuffles; nor the quality of any group of shuffles.

      I'm not going to try and explain that beyond saying it is the nature of modular arithmetic; but by way of evidence I offer this. The standard PRNG used in perl 5.10 on windows is the notorious MSC rand that has only 15-bits: 0-32767.

      However, if you use F-Y to shuffle any size array with less that 32767 elements; no matter a how many times you do it (within the bounds of reasonable values: say a human lifetime) then you will not detect any bias using simple std deviations or other simple correlation tools.

      Eg. Run this code with any $L < 32767, and any $N, and try to find some bias using your test and you will fail:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; our $N //= 1e6; our $L //= 50; my %counts; ++$counts{ int( rand $L ) } for 1 .. $N; pp \%counts;
      C:\test>junk77 -L=5 -N=1e7 { "0" => 1999935, 1 => 2000440, 2 => 1999682, 3 => 2001882, 4 => 19980 +61 } C:\test>junk77 -L=5 -N=1e7 { "0" => 1999465, 1 => 2000290, 2 => 1999884, 3 => 1999629, 4 => 20007 +32 } C:\test>junk77 -L=5 -N=1e7 { "0" => 1999025, 1 => 1999024, 2 => 2000250, 3 => 1999085, 4 => 20026 +16 } C:\test>junk77 -L=5 -N=1e7 { "0" => 1999941, 1 => 2001174, 2 => 1998446, 3 => 1999105, 4 => 20013 +34 } C:\test>junk77 -L=5 -N=1e7 { "0" => 1998594, 1 => 1999564, 2 => 2002043, 3 => 2000208, 4 => 19995 +91 } C:\test>junk77 -L=5 -N=1e8 { "0" => 19998201, 1 => 20012390, 2 => 19994928, 3 => 19998284, 4 => 1 +9996197 } C:\test>junk77 -L=50 -N=1e7 { "0" => 199673, 1 => 200117, 2 => 199619, 3 => 200785, 4 => 199947, 5 => 201072, 6 => 200075, 7 => 200212, 8 => 200173, 9 => 200781, 10 => 199524, 11 => 200163, 12 => 199981, 13 => 200973, 14 => 200483, 15 => 199633, 16 => 199506, 17 => 200081, 18 => 199572, 19 => 200733, 20 => 198890, 21 => 200602, 22 => 199665, 23 => 199819, 24 => 199935, 25 => 199939, 26 => 199868, 27 => 199960, 28 => 199116, 29 => 199926, 30 => 200444, 31 => 200205, 32 => 199426, 33 => 199787, 34 => 199578, 35 => 199312, 36 => 200249, 37 => 199743, 38 => 201357, 39 => 200411, 40 => 200164, 41 => 200179, 42 => 199436, 43 => 199302, 44 => 200279, 45 => 199640, 46 => 199267, 47 => 199733, 48 => 199664, 49 => 201001, }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

      Maybe this explains it better?

      The following program uses this heavily biased PRNG:

      sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) }

      which is designed to produce 0, 50% of the time. That means that when asked for numbers 0-9, it produces '0', 10 times more often than any of the other values:

      [549637, 50008, 50261, 50195, 50365, 49871, 50000, 49692, 49768, 50203 +]

      And when asked for 0..23, produces '0' 25 times more often than any other number:

      [ 521654, 20747, 20716, 20614, 20914, 20740, 20906, 20580, 20625, 2107 +7, 20736, 20912, 20835, 20820, 20958, 20818, 20866, 20818, 20969, 206 +46, 20749, 20953, 20454, 20893, ]

      But when that biased PRNG used within a standard Fisher Yates shuffle, the shuffles are fair, no matter how many times you run it:

      { ABCD => 67436, ABDC => 67742, ACBD => 67990, ACDB => 68277, ADBC => 68018, ADCB => 67846, BACD => 67766, BADC => 67859, BCAD => 68598, BCDA => 68919, BDAC => 69172, BDCA => 67870, CABD => 67755, CADB => 67192, CBAD => 67657, CBDA => 67731, CDAB => 67793, CDBA => 67680, DABC => 67755, DACB => 68016, DBAC => 68098, DBCA => 67988, DCAB => 67666, DCBA => 68408, }

      The test code:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) } sub shuffle { $a = $_ + badRand( @_ - $_ ), $b = $_[$_], $_[$_] = $_[$ +a], $_[$a] = $b for 0 .. $#_; return @_; } my @rands; ++$rands[ badRand( 10 ) ] for 1 .. 1e6; pp \@rands; <STDIN> +; my @vals = ( 'A' .. 'D' ); my %tests; my $c = 0; while( 1 ) { ++$tests{ join '', shuffle( @vals ) }; unless( ++$c % 576 ) { system 'cls'; pp \%tests; } } __END__ c:\test> junk77 [549637, 50008, 50261, 50195, 50365, 49871, 50000, 49692, 49768, 50203 +] { ABCD => 67436, ABDC => 67742, ACBD => 67990, ACDB => 68277, ADBC => 68018, ADCB => 67846, BACD => 67766, BADC => 67859, BCAD => 68598, BCDA => 68919, BDAC => 69172, BDCA => 67870, CABD => 67755, CADB => 67192, CBAD => 67657, CBDA => 67731, CDAB => 67793, CDBA => 67680, DABC => 67755, DACB => 68016, DBAC => 68098, DBCA => 67988, DCAB => 67666, DCBA => 68408, }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
        But when that biased PRNG used within a standard Fisher Yates shuffle, the shuffles are fair, no matter how many times you run it

        I am not convinced.

        A statistical test for assessing statistical fairness is chi-square: throw a dice a million times and feed chi-square with the counts for each outcome: '1' => ..., '2' => ... , ... '6' => .... It will tell you whether it thinks the dice is fair or not.

        So I run your test code with 3 rand-subs:

        use Math::Random::MT; my $mt = Math::Random::MT->new(); sub badRand($) { rand() < 0.5 ? 0 : rand( $_[0] ) } sub goodRand($) { rand($_[0]) } sub bestRand($) { $mt->rand($_[0]) }

        Then I run the chi-squared test (Statistics::ChiSquare) on the counts of the shuffle (re: the output of your test code).

        This is the result:

        bad_shuffle : There's a <1% chance that this data is random. 
        good_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
        best_shuffle : There's a >25% chance, and a <50% chance, that this data is random.

        or

        bad_shuffle : There's a <1% chance that this data is random.
        best_shuffle : There's a >50% chance, and a <75% chance, that this data is random.
        good_shuffle : There's a >10% chance, and a <25% chance, that this data is random.
        

        or

        bad_shuffle : There's a <1% chance that this data is random.
        best_shuffle : There's a >25% chance, and a <50% chance, that this data is random.
        good_shuffle : There's a >50% chance, and a <75% chance, that this data is random.
        

        The above test consistently rates "bad_shuffle" based on badRand() as producing data with less than 1% chance of being random. It is undecided, let's say, which is best for this particular shuffle: Mersenne or Perl.

        The results your test code produced may look fair but they aint according to this particular test - it very much depends on use-case (see end of post). Although you may be right with your hunch that Mersenne may not offer anything more than perl's standard rand() for FY shuffle but hey let's not exaggerate with badRand()! :)

        This is my test code:

        sidenote:

        A particular shuffle may be appropriate for a particular situation.

        Such as?

        1. poker and co where patterns in the shuffled cards may influence the game. The shuffled cards must be tested for the number of patterns they contain.
        2. medical trials : a list of "same-category" (e.g. all healthy) patients must be shuffled in order to then separate them to different groups for trial treatment.
        3. related to #2 : I have a group with certain average properties. Then I group them randomly, after a shuffle, does each group has on average the same properties? Or have I grouped together the tall boys and put in another group the short boys, because my shuffle algorithm was not suitable?
Re: Shuffling CODONS
by AnomalousMonk (Archbishop) on Jun 07, 2018 at 17:32 UTC

    WouterVG:   Further to thanos1983's reply:   The reason thanos1983's use of the  '(a3)*' unpacking template in the Update unpack example given there is much better than the template generated | dynamically generated and used in the OPed
        @trips = unpack("a3" x (length($sequence)-2), $sequence);
    statement is that the latter produces many (possibly very many) spurious fields:

    c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dumper; ;; my $sequence = 'ABCDEFG'; print qq{sequence entered: '$sequence'}; warn sprintf 'sequence length (%d) not multiple of 3', length($sequen +ce) if length($sequence) % 3; ;; my @trips = unpack('a3' x (length($sequence)-2), $sequence); print Dumper \@trips; ;; @trips = unpack('(a3)*', $sequence); print Dumper \@trips; " sequence entered: 'ABCDEFG' sequence length (7) not multiple of 3 at -e line 1. $VAR1 = [ 'ABC', 'DEF', 'G', '', '' ]; $VAR1 = [ 'ABC', 'DEF', 'G' ];
    (Note that Data::Dumper is a core module.)


    Give a man a fish:  <%-{-{-{-<

Re: Shuffling CODONS
by swl (Prior) on Jun 10, 2018 at 09:37 UTC
Re: Shuffling CODONS
by Anonymous Monk on Jun 07, 2018 at 18:32 UTC

    Hello there. I realize this may be just a learning exercise, but the mention of a 'proper' shuffle came up above...

    A shuffle is a random list permutation, of which there are n! possibilities. A 'proper' shuffle would emphasize the quality of the random source. PRNG needs at least log2(n!) bits of internal state to be able to generate all possible permutations. List::Util shuffle is likely using drand48.

    $ perl -MList::Util=sum -e 'print 1/log(2) * sum map log, 1..17;'
    48.337603311133
    $ perl -MList::Util=sum -e 'print 1/log(2) * sum map log, 1..1000;'
    8529.39800420478
    

    As you can see, 48 bits is not enough to properly shuffle a list of 17 elements. For one thousand element shuffle, more than a kilobyte of randomness is required.

      As you can see, 48 bits is not enough to properly shuffle a list of 17 elements. For one thousand element shuffle, more than a kilobyte of randomness is required.

      Utter twaddle! What I see is someone adding 2 + 1/log2 * sum of the log2 of an arbitrary list and drawing a random, and wrong, conclusion.

      The Knuth-Fisher-Yates shuffle only needs to be able to pick 1 from N -- that is, generate a uniform random number between 1 and N; where N is the number of elements in the array -- as proven by Donald Knuth, arguably the greatest Computer Scientist of the current era.

      Discuss.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

        "arguably the greatest Computer Scientist of the current era. "

        But they were not called the "Donald of Computer Programming", he looked up to someone else

        "One of the first times I was ever asked about the title of my books was in 1966, during the last previous ACM national meeting held in Southern California. This was before any of the books were published, and I recall having lunch with a friend at the convention hotel. He knew how conceited I was, already at that time, so he asked if I was going to call my books "An Introduction to Don Knuth." I replied that, on the contrary, I was naming the books after him. His name: Art Evans. (The Art of Computer Programming, in person.) "
        http://www.paulgraham.com/knuth.html
        https://cacm.acm.org/magazines/1974/12/11626-computer-programming-as-an-art/abstract

        "Always Mount a Scratch Monkey" https://www.acme.com/jef/netgems/scratch_monkey.html

        ... uniform random number between 1 and N; where N is the number of elements in the array ...
        Correction: several such random numbers. Alternatively, one random number 1..N, where N is the number of permutations.

        The shuffle algorithm needs a random number sequence of finite length. A PRNG with 48 bits of internal state can generate at most 248 different sequences (of some particular length), because it is deterministic. If there are more permutations than that, some will never be selected.

        Those using GNU/Linux can take a quick glimpse at the working of the standard shuffle tool, shuf.

        $ strace -s0 -e open,read shuf -o /dev/null -i 1-17
        ...
        open("/dev/urandom", O_RDONLY)          = 3
        read(3, ""..., 11)                      = 11
        ...
        $ strace -s0 -e open,read shuf -o /dev/null -i 1-1000
        ...
        open("/dev/urandom", O_RDONLY)          = 3
        read(3, ""..., 1250)                    = 1250
        ...
        
        Here, in order to shuffle a list of integers 1-17, shuf actually wants 11 random bytes. For 1000 elements, shuf reads 1250 bytes of /dev/urandom.