Hmm, that's not quite right. You've not eliminated the chance of duplicate values in the output. You want something more like:
{ my %selected_set; my $choose_one = sub { $selected_set{ @input[rand @input] } = 1) } +; $choose_one->() while keys %selected_set < $choose_count; my @selected = keys %selected_set; }
The problem of non-termination is indeed something that will bite you when you least expect it. I believe it can only be solved probabilistically in the absence of a complete scan of the input, either by shuffle or by calculating a histogram of the set of input values somehow. In a workflow situation I'd probably try to get the histogram precalculated for me, and then you can actually use the numerical weights to make your selection, since this scales better to large weights than duplicating input.

So in the absence of that kind of knowledge, I see two ways of reducing the probability of a hang. First way is to use a shuffle for small datasets and random selection for large datasets, where small/large division can be arbitrary, or determined dynamically by scanning the front of the dataset to make sure there are "enough" different values.

The second probabilistic method is to count how many times you've made a random selection, and give up if the number of attempts far outweighs the number of desired values (and maybe print a warning, so you know why your program now takes five seconds to run instead of five microseconds). But running for five seconds and producing some output is a lot better than running forever and producing no output.


In reply to Re^2: removing the goto by TimToady
in thread removing the goto by scoobyrico

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.