Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
There was advice to use brute force, at least as accepted answer in linked question (from 2021) there. Not sure if I got it right, I don't read Ruby, and didn't try other answers. Both brute subroutines below aren't actually used: they are totally unusable for lists with ~15 unique strings or more, plus any decent amount of duplicates. I wrote 2nd one because I couldn't believe it at first (accepted solution??). They are left just in case anyone wants to try (or point at my mistakes in implementation?), and can be ignored.
Back to answers at SO, there are 2 Perl solutions. One (a) doesn't compile; (b) if fixed, emits a warning for un-initialized value; (c) if fixed (or ignored), it seems to work OK. But for (corner-case, of course) input of (b,a,a), it gives answer (b,a). I didn't look further.
Another solution (by esteemed monk) fails randomly for e.g. corner-case (a,a,a,b,b) -- the only answer can be (a,b,a,b,a), of course. Why does it fail? Output list is initialized to e.g. (a,b). If 1st key to iterate is "a", then one of 2 remaining "a"'s is added to give (a,b,a) with no place for 2nd remaining "a". So, easy fix would be kind of "breadth-first" hash consumption. I'm sorry if code I had to add looks ugly to the author.
This fixed version will serve as reference to compare my solution to, it generates truly random lists.
With algorithm I suggest -- ask for twice as many random indexes from remaining pool, then simply reject (comb out) half of them. It guarantees there will be no consecutive dupes (and of course doesn't mean "only odd or even indexes for this value").
One obvious compromise on randomness will be "dupes are never placed at both head and tail" -- except corner-cases such as 'aba' or 'abaca', of course. There are actually 3 cases, depending on size of remaining pool. Cases "2" and "3" restrict randomness further. E.g., for 'aaaabbbcc', the 'c' is never placed at indexes 0 or 1 -- unlike the "reference SO implementation with true randomness".
However, lines with "die" in them can be un-commented (and they were un-commented during benchmarking) if input is not an artificial corner-case -- this code is never reached with realistic data. I mean, other than corner-cases and head/tail restriction, my algorithm seems to produce random enough result.
(In fact, one of "requests" of "RFC" is how to estimate randomness (entropy) for multiple runs of subroutine. Didn't look into that yet.)
Further "requests" are: can it be improved? Both List::MoreUtils::samples and e.g. (unused) Math::Prime::Util::randperm return their result shuffled, which I don't need and have to sort back to order! And more, e.g. samples takes random samples and therefore should know which items were unselected, but I have no better way to find out "which" except with more work using singleton. It feels like huge amount of unnecessary work I do (though it's still much faster than "SO reference solution"). Or maybe, perhaps, someone would suggest even faster solution?
(+ I understand there's sloppiness on my side in e.g. $uniq variable name doesn't actually mean number of unique items which fake_data returns. I hope this (and similar) can be forgiven.)
use strict; use warnings; use feature 'say'; use List::Util qw/ shuffle /; use List::MoreUtils qw/ part samples singleton /; use ntheory qw/ forperm lastfor /; use Algorithm::Combinatorics qw/ permutations /; use Benchmark 'cmpthese'; my @input = shuffle( qw( a a a a b b b c c )); # corner-cases @input = shuffle( qw( a a a b b )); # srand 123; @input = fake_data( 555, 55 ); #say scalar @input; # 2096 sub fake_data { my ( $uniq, $pivot ) = @_; my @tmp = map { sprintf '< %06d >', rand 1e9 } 0 ... $uniq; my @out; push @out, @tmp[ 0 .. $_ ] for 0 .. $pivot; @out = shuffle( @out, @tmp[ $pivot + 1 .. $uniq ]); return @out } cmpthese 10, { SO_fixed => sub { die unless SO_fixed( \@input )}, my_shuffle => sub { die unless my_shuffle( \@input )}, }; sub brute { my $input_ref = shift; my @output; forperm { my $prev = ''; for ( @_ ) { return if $prev eq $input_ref-> [ $_ ]; $prev = $input_ref-> [ $_ ] } @output = @{ $input_ref }[ @_ ]; lastfor, return } @$input_ref; return \@output } sub brute2 { my $input_ref = shift; my @output; my $iter = permutations( $input_ref ); PERM: while ( my $p = $iter-> next ) { my $prev = ''; for ( @$p ) { next PERM if $prev eq $_; $prev = $_ } @output = @$p; last PERM } return \@output } sub SO_fixed { my $input_ref = shift; my %counts; ++$counts{ $_ } for @$input_ref; my @strings = shuffle keys %counts; LOOP: { my $any = 0; for my $string ( keys( %counts ) ) { next if $counts{ $string } == 1; $counts{ $string } --; $any = 1; my @safe = grep { $_ == 0 || $strings[ $_ - 1 ] ne $string + } grep { $_ == @strings || $strings[ $_ ] ne $string + } 0 .. @strings; return undef unless @safe; my $pick = $safe[ rand( @safe ) ]; splice( @strings, $pick, 0, $string ); } redo LOOP if $any } return \@strings } sub my_shuffle { my $input_ref = shift; my @output; my %counts; $counts{ $_ } ++ for @$input_ref; my ( $single, $multi ) = part { $counts{ $_ } > 1 } keys %counts; my @multi = sort { $counts{ $b } <=> $counts{ $a }} @$multi; my @pool = ( 0 .. $#$input_ref ); for my $str ( @multi ) { my $count = $counts{ $str }; my @take; if ( $count <= @pool / 2 ) { # case 1 my @excess = sort { $a <=> $b } samples( 2 * $count, @pool + ); my $n = int rand 2; my @idx = grep { $n ^ $_ % 2 } 0 .. $#excess; @take = @excess[ @idx ]; } elsif ( 2 * $count - 1 == @pool ) { # case 2 #die 'This code is unreachable for realistic input'; my @idx = grep { not $_ % 2 } 0 .. $#pool; @take = @pool[ @idx ]; } else { # case 3 #die 'This code is unreachable for realistic input'; my $prev = -2; my @ok = grep { my $res = $_ - $prev; $prev = $_; $res > 1 } @pool; return undef if $count > @ok; @take = samples( $count, @ok ); } @pool = singleton @pool, @take; @output[ $_ ] = $str for @take; } @output[ @pool ] = @$single if @pool; return \@output; } __END__ (warning: too few iterations for a reliable count) Rate SO_fixed my_shuffle SO_fixed 2.29/s -- -95% my_shuffle 42.7/s 1763% --
|
---|