Select a random record

sjaak has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Select a random record by ncw (Friar) on Sep 05, 2000 at 00:04 UTC
You could count the number of records using `SELECT count() FROM table` [download] Then you could pick a random number from 0..$n-1 - call it $rand and using the 2 paramter form of LIMIT select it, eg `SELECT FROM table ORDER BY id LIMIT $rand, 1` [download] This should be very quick and efficient. You should ORDER BY the primary key probably for maximum speed (where it says id above). This method has the advantage that it will work with a WHERE clause on the the SELECTs), eg WHERE setup=1 or something like that This two parameter LIMIT might be MySQL specific I don't know (it doesn't say it is in the manual though) Update: I tried this on a large MySQL table (with 100ish columns and 500,000ish rows). Leaving out ORDER BY made only a very small difference. The first SQL statement was instant. The second took 8 seconds varying by $rand so maybe it is doing a full table scan here :-(	[reply] [d/l] [select]
RE: Re: Select a random record by elwarren (Priest) on Sep 05, 2000 at 02:48 UTC
Order by is only going to add overhead. If he wants a random record it will be faster to return them in their stored order.	[reply]
RE: Re: Select a random record by Maclir (Curate) on Sep 05, 2000 at 00:57 UTC
Looking through my Informix SQL reference manual, there is no reference to the "LIMIT" parameter.	[reply]
Re: Select a random record by Maclir (Curate) on Sep 05, 2000 at 04:51 UTC
I have applied some more thought to your problem. I have never used MySql, only Oracle and Informix, and not even anywhere approaching DBA expertise. So take my suggestions with a hefty grain of salt . . . As I see it, there are two objectives; the first being able to select a random record, the second being able to do this in a relatively non-"anti-social" manner. You do not want to incur the wrath of your DBA, Sysadmin, IS manager, and so on. The most efficient way to retreive any sing record from a table is to use the "WHERE" clause on a primary key (or an alternate key where an index has been set up). Even a DBMS with an atrocious optimiser will do this without any difficulty. One method would be to add another column - say "sequence_number", make it NOT NULL and UNIQUE, and set it as an alternate key. However, you will have to make sure that as records are added, this column gets the next number in sequence. Plus, where records are deleted, what then happens . . . the maintenance hassle is too nasty to contemplate. Some DBMS's allow a special data type of "ROWID" or "SERIAL", which may allow you to have the system look after this for you. I know there is a ROWID with Informix, but I cannot find the right manual to explain if that is a proper sequence of integers. Besides, that is not ANSI standard SQL. Another option may be to use a cursor - but this is stretching my limited data base knowledge. Something to look at though. Ken	[reply]
RE: Re: Select a random record by sjaak (Novice) on Sep 08, 2000 at 09:07 UTC
Ken, Thanks for the idea, this is what I will use. The database does not exist yet and I am free to add as much columns as I like (I am the DBA). I will add a column in the table containing the serial_nr. Then, with a SELECT COUNT(*) FROM table and a SELECT record FROM table WHERE table.serial_nr = random_nr. If the record should appear to be deleted (not likely), then I re-select a record. greetings, sjaak	[reply]
Re: Select a random record by elwarren (Priest) on Sep 05, 2000 at 02:44 UTC
Selecting all the records out of a database just to choose a random one will definitely have the DBA breathing down your neck. I haven't used mysql much, so I can't comment on the LIMIT parameter. What you should do is find out which column on your table is the primary key. If there is no primary key then you will need a unique column. Use this column to build a list, you could stuff it in an array if it's not too large. Pick a random number and look it up via the array. Now run your select against the table to return this row. It will be worlds faster.	[reply]
Re: Select a random record by nop (Hermit) on Sep 05, 2000 at 05:10 UTC
I agree with Ken. His concern that deletes will cause trouble is pretty minor, and easily handled. As Ken suggested, create a row id. Keep track of the maximum rowid, call it M. To get a random row, choose a random integer in 1..M and then select the corresponding row. If you start deleting rows, then you'll have holes: the table will have N rows labeled 1..M, with N<M. In this case, choose your random int from 1..M and select the corresponding row. If there's no row there, just choose another random int and try again. This introduces no bias; the row eventually returned will be uniformly chosen from all the rows. If N/M is large (near 1, few holes), this approach works fine. When N/M degrades, you'll waste too much time shooting into holes. When N/M is unacceptably low (say less than .9), lock the table and reassign the rowids sequentially.	[reply]
Re: Select a random record by lhoward (Vicar) on Sep 05, 2000 at 05:10 UTC
If you're willing to add a column to your table you could use this approach: Add a column to your table, I'm gonna call mine FOO in this example. Index FOO. You could do this with any type of value (integer, varchar, float/real, etc..). Assign every record currently in the table a "random" value within a range (say you decided that foo was a CHAR(10) you could assign it values from 'aaaaaaaaaa' to 'zzzzzzzzzz'). Whenevere you insert a new row into the DB assign it a random value in FOO as well. Now whenever you want a random record just generate a random seed (using the same range and algorithm you used to seed the table originally) then *select from BAR where FOO >= '$seed'**. Don't fetch back all the rows (you could use LIMIT if your dialect of SQL supports it). Fetch them rows from the query back one-at-a-time while FOO has a constant value. If you got back only 1 row, then you have your match. If you got back multiple rows then pick randomly among them. If you do not pick an apropriate size for FOO and a valid randomizing function and your table grows much larger than expected this could become inefficient. But it should work good in some circumstances.	[reply]
RE: Re: Select a random record by nop (Hermit) on Sep 05, 2000 at 05:14 UTC
The key to lhoward's approach is to index FOO. This is a critical detail: without an index, you're tablescanning again.	[reply]
Re: Select a random record by Anonymous Monk on Sep 05, 2000 at 07:21 UTC
select key from (select key, count(1) 'seq' from table) where seq = $RAND or the equivalent using temporary tables. It's what databases do, so it will suck to build the temp table for large databases. If you can get the DBA to add a sequence number, then use that instead of the temp table. If your sequence number can have holes, then get the first record where seq >= $RAND. That's assuming your holes have even distribution also. barring that, I see no alternative to reading a cursor until your number is up (from a 'select key, count(1) from table' query. Do a 'select count(1) from table' to determine the number of records in the table. BTW, reading from the cursor until you get to the record you want (i.e. a sequential search) happens all the time in SQL. Its just called a 'table scan' instead, and happens when there's not an available index to locate matching records for a where clause.	[reply]
Re: Select a random record by jptxs (Curate) on Sep 05, 2000 at 05:20 UTC
I'm an Oracle guy, and if the DB is Oracle Maclir's suggestion is best by far. UPDATE: on second thought, if you DB supports DISTINCT and ROWNUM you could do something like this: `select distinct fname from jonathan.sc_name where fname like '%D%' and + rownum = 1;` [download] fname is a column i know will only have so many values with D in them, use a col like this in your own records. I get all of those, dictinctly to avoid duplicates, and then add that I only want one of the rows that result from this. if you use a number column and a rand number, you should be good to go.	[reply] [d/l]