[Edited to explain the theory more completely.]

Lifted from the Perl Cookbook recipe 8.6 'Picking a Ranom Line from a File' generalized out to N lines rather than 1. Most deep thought comes from Messrs. Christiansen and Torkington. All errors come from me.

# Call srand if you are on an old perl (< 5.004) my $statement = 'SELECT ssn FROM ...'; my $sth = $dbh->prepare($statement); $sth->execute(); my $target_lines = 1000; my @result = (); my $item_num = 1; while (my ($ssn) = $sth->fetchrow_array()) { if (@result < $target_lines) { # Just add it if we have not hit the limit push @result, $ssn; } elsif (rand($item_num) < $target_lines) { # Otherwise pick something to replace if the # the odds are right $result[rand($target_lines)] = $ssn; } } continue { $count++; }

You can look in the Cookbook for the full description of why this works for a single target_line. The short form is that it will first fill the list with $target_lines elements, then each subsequent line will have a slightly lessened chance of replacing a predecessor. Basically every element in the array should have the same probability of being there at a given point in time. So until the 1001st element each item has a 100% chance of being there. Then the 1001st element has a 1001/1000 chance of replacing something, then there is a 1/1000 chance for each element. Adding that all up it means each element will have a 1001/1000 chance of staying. Then for the 1002nd element, it has a 1002/1000 chance of replacing an element. Again that all adds up to 1002/1000 (1/1000 + 1001/1000) for each element.

The standard caveats about the randomness of the random numbers returned from random apply.

This solution is good if you don't want to slurp all elements in from the DB and if you have to use standard SQL (i.e. no limit clause).

-ben


In reply to Re: Random Picks by knobunc
in thread Random Picks by lestad

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.