The length of the generated data is a parameter that does not modify the ranking because the costs of all the given solutions are linear to it.

If you change the set size or the maximun number of consecutive repetitions instead, then the ranking is not so stable. The benchmarks below show it.

I have also added a new solution salva2 (much faster in average) and some variations of almut's one.

The code:

ur @set; our $max_reps; our $len; sub gen_almut { my $result = ''; while (length($result) < $len) { my $char = $set[ rand @set ]; next if substr($result, -$max_reps) eq $char x $max_reps; $result .= $char; } $result; } sub gen_almut_mod2 { my $result = ''; for (1..$len) { my $char = $set[ rand @set ]; redo if substr($result, -$max_reps) eq $char x $max_reps; $result .= $char; } $result; } sub gen_almut_mod3 { my $result = ''; my $char; for (1..$len) { do { $char = $set[ rand @set ] } while substr($result, -$max_reps) eq $char x $max_reps; $result .= $char; } $result; } sub gen_almut_mod4 { my $result = ''; while (length($result) < $len) { my $char = $set[ rand @set ]; $result .= $char if substr($result, -$max_reps) ne $char x $max_reps; } $result; } my $buf; sub gen_salva2 { while (length $buf < $len) { $buf .= $set[rand @set] for (1..$len*100); $buf =~ s/(.)\1{$max_reps,}/$1 x $max_reps/ge; } substr($buf, -$len, $len, ''); } sub gen_salva { my $out = ''; my $last_ix = -1; my $reps = 0; for (1..$len) { my $ix; if ($reps >= $max_reps) { $ix = int rand(@set - 1); $ix++ if $ix >= $last_ix; } else { $ix = int rand(@set); } if ($ix == $last_ix) { $reps++; } else { $last_ix = $ix; $reps = 1; } $out .= $set[$ix]; } $out; } use Benchmark qw(cmpthese); $len = 12; for my $elements (2, 5, 10) { @set = (1..$elements); for $max_reps (1, 2, 3) { print "elements: $elements max_reps: $max_reps\n"; $buf = ''; cmpthese(-4, { almut => \&gen_almut, almut_mod2 => \&gen_almut_mod2, almut_mod3 => \&gen_almut_mod3, almut_mod4 => \&gen_almut_mod4, salva2 => \&gen_salva2, salva => \&gen_salva } ); } }

And the benchmarking results:

elements: 2 max_reps: 1 Rate almut almut_mod4 almut_mod3 almut_mod2 salva2 + salva almut 37325/s -- -2% -10% -10% -13% + -28% almut_mod4 38188/s 2% -- -8% -8% -11% + -26% almut_mod3 41453/s 11% 9% -- -0% -3% + -20% almut_mod2 41553/s 11% 9% 0% -- -3% + -20% salva2 42707/s 14% 12% 3% 3% -- + -18% salva 51951/s 39% 36% 25% 25% 22% + -- elements: 2 max_reps: 2 Rate almut_mod3 almut almut_mod4 almut_mod2 salv +a salva2 almut_mod3 56662/s -- -2% -2% -3% -3 +% -15% almut 57823/s 2% -- -0% -1% -1 +% -14% almut_mod4 58097/s 3% 0% -- -0% -1 +% -13% almut_mod2 58234/s 3% 1% 0% -- -1 +% -13% salva 58651/s 4% 1% 1% 1% - +- -12% salva2 66988/s 18% 16% 15% 15% 14 +% -- elements: 2 max_reps: 3 Rate salva almut_mod3 almut almut_mod4 almut_mod +2 salva2 salva 60150/s -- -0% -8% -9% -10 +% -24% almut_mod3 60258/s 0% -- -8% -9% -10 +% -24% almut 65430/s 9% 9% -- -1% -2 +% -18% almut_mod4 65946/s 10% 9% 1% -- -1 +% -17% almut_mod2 66827/s 11% 11% 2% 1% - +- -16% salva2 79446/s 32% 32% 21% 20% 19 +% -- elements: 5 max_reps: 1 Rate salva almut almut_mod3 almut_mod4 almut_mod +2 salva2 salva 51954/s -- -8% -9% -10% -14 +% -23% almut 56479/s 9% -- -1% -2% -7 +% -16% almut_mod3 56799/s 9% 1% -- -1% -6 +% -15% almut_mod4 57418/s 11% 2% 1% -- -5 +% -15% almut_mod2 60666/s 17% 7% 7% 6% - +- -10% salva2 67156/s 29% 19% 18% 17% 11 +% -- elements: 5 max_reps: 2 Rate salva almut_mod3 almut_mod2 almut_mod4 almu +t salva2 salva 57703/s -- -11% -18% -18% -18 +% -34% almut_mod3 65188/s 13% -- -7% -7% -7 +% -25% almut_mod2 70102/s 21% 8% -- -0% -0 +% -19% almut_mod4 70338/s 22% 8% 0% -- -0 +% -19% almut 70435/s 22% 8% 0% 0% - +- -19% salva2 87051/s 51% 34% 24% 24% 24 +% -- elements: 5 max_reps: 3 Rate salva almut_mod3 almut almut_mod2 almut_mod +4 salva2 salva 59097/s -- -11% -17% -17% -18 +% -36% almut_mod3 66428/s 12% -- -7% -7% -8 +% -28% almut 71086/s 20% 7% -- -0% -2 +% -23% almut_mod2 71117/s 20% 7% 0% -- -2 +% -23% almut_mod4 72395/s 23% 9% 2% 2% - +- -21% salva2 91896/s 56% 38% 29% 29% 27 +% -- elements: 10 max_reps: 1 Rate salva almut_mod3 almut_mod2 almut almut_mod +4 salva2 salva 51468/s -- -17% -22% -26% -26 +% -39% almut_mod3 61696/s 20% -- -7% -11% -11 +% -27% almut_mod2 66266/s 29% 7% -- -4% -4 +% -21% almut 69256/s 35% 12% 5% -- -0 +% -18% almut_mod4 69269/s 35% 12% 5% 0% - +- -18% salva2 84117/s 63% 36% 27% 21% 21 +% -- elements: 10 max_reps: 2 Rate salva almut_mod3 almut_mod2 almut_mod4 almu +t salva2 salva 57688/s -- -12% -20% -25% -26 +% -41% almut_mod3 65386/s 13% -- -9% -15% -17 +% -33% almut_mod2 72116/s 25% 10% -- -6% -8 +% -26% almut_mod4 77021/s 34% 18% 7% -- -2 +% -21% almut 78381/s 36% 20% 9% 2% - +- -19% salva2 97336/s 69% 49% 35% 26% 24 +% -- elements: 10 max_reps: 3 Rate salva almut_mod3 almut_mod2 almut_mod4 almu +t salva2 salva 59236/s -- -9% -18% -25% -25 +% -39% almut_mod3 65161/s 10% -- -10% -17% -17 +% -33% almut_mod2 72259/s 22% 11% -- -8% -8 +% -26% almut_mod4 78569/s 33% 21% 9% -- -0 +% -19% almut 78580/s 33% 21% 9% 0% - +- -19% salva2 97565/s 65% 50% 35% 24% 24 +% --
Note that the salva2 algorithm has some bias. It is able to generate all the valid solutions but not all of them have the same probability.

In reply to Re^5: Random data generation. by salva
in thread Random data generation. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.