Re: Random data generation.

ikegami's is a general solution for all cases, but would it be sufficient in non-extreme cases (i.e., not trying to generate 100-character strings from just two different characters and with no more than two sequential repeats) to just generate bunches of strings and throw away all that are non-compliant?

>perl -wMstrict -le
"use List::Util qw(shuffle);
 my @set = qw(a b c d e f);
 my $len = 12;
 my $needed = shift;
 my @out;
 while (@out < $needed) {
   my $str = join '', shuffle((@set) x $len);
   push @out, grep !m{ (.)\1\1 }xms, $str =~ m{ .{$len} }xmsg;
   }
 @out = @out[-$needed .. -1];
 m{ (.)\1\1 }xms and die qq{3+ same: '$_'} for @out;
 printf qq{%3d: '$out[$_]' \n}, 1+$_ for 0 .. $#out;
" 5
  1: 'becaecdbbedc'
  2: 'afccbecefdca'
  3: 'fbffecacfeaa'
  4: 'ddbacddfafca'
  5: 'fbebdabadedc'
[download]

Comment on Re: Random data generation. Download Code

Replies are listed 'Best First'.
Re^2: Random data generation. by ikegami (Patriarch) on Jun 26, 2010 at 15:33 UTC
You have it backwards. The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away. What you call the non-extreme case is where your algorithm has the most problems. The real problem is that your solution isn't random. The chance of picking a characters at a certain position is affected by the characters picked in previous positions.	[reply]
Re^3: Random data generation. by AnomalousMonk (Archbishop) on Jun 26, 2010 at 20:06 UTC
The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away. I thought that that, in essence, was implied by what I wrote. What you call the non-extreme case is where your algorithm has the most problems. I take 'problems' to mean the generation of strings that don't meet the max-repeated-characters requirement and so must be thrown away. So, if my algorithm was trying to generate strings of length 3 from a character set of 100 characters (my idea of a very 'non-extreme' case), which in the totally random case (OK, OK, my algorithm is only approximately random – but see below) would result about 1 in a million times in a string of three identical characters that needed to be discarded, this would be more problematic than trying to generate strings of, say, length 100 from a set of three characters? I don't understand. The real problem is that your solution isn't random. I agree it isn't completely random, but, as others have noted, it wasn't clear from the OP and subsequent discussion that complete randomness was required. I was aiming for something quick and dirty that would satisfy the max-repeated-characters requirement while still being sort of random-ish.	[reply]
Re^4: Random data generation. by ikegami (Patriarch) on Jun 27, 2010 at 06:16 UTC
I had missed the "not" in "not trying" despite multiple readings.	[reply]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 28, 2010 at 12:12 UTC
See salva's second approach Re^5: Random data generation. for a couple of innovations that make this approach viable.	[reply]