lestad has asked for the wisdom of the Perl Monks concerning the following question:

I have a project in which I would like to use perl to pick x number of ss# from a list in a mysql database. The mysql database will have about 2500 names comprised of simply 3 columns ( firstname,lastname,ss#). I would like to pick a number such as 1000 and have it generate a random list of ss#'s from the list; then print only those selected to a file for distibution. I know the simple things like using dbi to connect perl to mysql but is the rest even possible in the above scenario. Thanks in advance for any assistance.

Replies are listed 'Best First'.
Re: Random Picks
by tune (Curate) on May 09, 2001 at 17:21 UTC
    Load all the ss#'s into an array first, then pick randomly and load 1000 of them into another array. It is not that CPU consuming if you have only 2500 records.
    The following code snippet will be useful:

    $array[rand(@array)]

    Be careful to not duplicate a pick. Maybe you could load the picks into a hash's key and investigate if you reached the required 1000 element or still under that.
    Good luck!

    --
    tune

      To not duplicate a pick you do a Fisher-Yates shuffle and stop after N steps:

      my @ssn= getAllSSNs(); my $want= 1000; for( 0..$want-1 ) { my $choice= $want + rand(@ssn-$want); @ssn[$_,$choice]= @ssn[$choice,$_]; } return @ssn[0..$want-1];

              - tye (but my friends call me "Tye")
Re: Random Picks
by Masem (Monsignor) on May 09, 2001 at 17:39 UTC
    You can use a relatively new feature of MySQL to do this (no perl needed):
    SELECT ssn FROM ssn_table ORDER BY RAND LIMIT 3
    That is, the ordering by RAND is the trick. I forget which version of MySQL introduces it, but if you don't have it, the other suggestions presented are a bit more portable.


    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      I tried this with a 250,000 records mysql table on a really fast machine (one of our production servers), and 20 random records costed 17 seconds to process, so be very careful!

      --
      tune

Re: Random Picks
by knobunc (Pilgrim) on May 09, 2001 at 18:00 UTC

    [Edited to explain the theory more completely.]

    Lifted from the Perl Cookbook recipe 8.6 'Picking a Ranom Line from a File' generalized out to N lines rather than 1. Most deep thought comes from Messrs. Christiansen and Torkington. All errors come from me.

    # Call srand if you are on an old perl (< 5.004) my $statement = 'SELECT ssn FROM ...'; my $sth = $dbh->prepare($statement); $sth->execute(); my $target_lines = 1000; my @result = (); my $item_num = 1; while (my ($ssn) = $sth->fetchrow_array()) { if (@result < $target_lines) { # Just add it if we have not hit the limit push @result, $ssn; } elsif (rand($item_num) < $target_lines) { # Otherwise pick something to replace if the # the odds are right $result[rand($target_lines)] = $ssn; } } continue { $count++; }

    You can look in the Cookbook for the full description of why this works for a single target_line. The short form is that it will first fill the list with $target_lines elements, then each subsequent line will have a slightly lessened chance of replacing a predecessor. Basically every element in the array should have the same probability of being there at a given point in time. So until the 1001st element each item has a 100% chance of being there. Then the 1001st element has a 1001/1000 chance of replacing something, then there is a 1/1000 chance for each element. Adding that all up it means each element will have a 1001/1000 chance of staying. Then for the 1002nd element, it has a 1002/1000 chance of replacing an element. Again that all adds up to 1002/1000 (1/1000 + 1001/1000) for each element.

    The standard caveats about the randomness of the random numbers returned from random apply.

    This solution is good if you don't want to slurp all elements in from the DB and if you have to use standard SQL (i.e. no limit clause).

    -ben

Re: Random Picks
by Chady (Priest) on May 09, 2001 at 17:22 UTC

    I think this is what u're talking about.. you're being vague by the way...

    $howmany = 1000; # this is the variable to hold the x number # assuming that you can get the number of ss#'s in some variable $numb +erOfss for (0..$howmany) { my $ra = int(rand($numberOfss)); print $ra,"\n"; # do whatever you want with it here.. }

    and then... probably not..

    Update: added a comment in the code.
    He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

    Chady | http://chady.net/
Re: Random Picks
by chipmunk (Parson) on May 09, 2001 at 18:13 UTC
    This code does the same thing as knobunc's, but it also keeps the randomly selected lines in their original order:
    my $select = 1000; my @ssns; while (my($ssn) = $sth->fetchrow_array()) { push @ssns, $_ if rand() < ($select / ++$count); if (@ssns > $select) { splice(@ssns, rand($#ssns), 1); } }
    This is less efficient, though, because it has to move the elements of the @ssns array down every time it splices one out. knobunc's is O(n), while this is O(n*m), where n is the total number of items and m is the number of items to select. (I hope I got that right! :)
Re: Random Picks
by tune (Curate) on May 09, 2001 at 17:29 UTC
    I have found an interesting solution (just type "mysql random rows" into google). It is fantastic, and fast!

    1. Add a FLOAT column to the table:
      ALTER TABLE random_selectable ADD random FLOAT;
    2. Fill this column with random numbers:
      UPDATE random_selectable SET random = RAND();
    3. Get 1000 values:
      SELECT * FROM random_selectable ORDER BY random LIMIT 1000;

    --
    tune

      Whoa! Way too complex there! Try this
      select * from table order by rand() limit 1000;
      There's no need to modify the table just to select a row at random from it.
Re: Random Picks
by srawls (Friar) on May 09, 2001 at 17:32 UTC
    First of all, I am in no way an expert on random numbers (well, psuedo-random numbers) - but this should suffice.

    To generate the psuedo-random numbers, use this:

    push @someArray, int(rand $tot) for(1 .. $x);
    Where $x is the number of ss#'s you want, and $tot is the number of rows in the database.

    The 15 year old, freshman programmer,
    Stephen Rawls
Re: Random Picks
by lestad (Initiate) on May 15, 2001 at 20:04 UTC
    Thank you all for your assistance with my problem. I don't know how I missed such a simple and easy approach,

    select * from table order by rand() limit 1000

    especially after I read the Paul Dubois book on mysql. I will use this approach to complete my task but to further to my knowledge of perl I will try the other appoaches as well. Thanks again to all who responded.