sjaak has asked for the wisdom of the Perl Monks concerning the following question:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Select a random record
by ncw (Friar) on Sep 05, 2000 at 00:04 UTC | |
Then you could pick a random number from 0..$n-1 - call it $rand and using the 2 paramter form of LIMIT select it, eg This should be very quick and efficient. You should ORDER BY the primary key probably for maximum speed (where it says id above). This method has the advantage that it will work with a WHERE clause on the the SELECTs), eg WHERE setup=1 or something like that This two parameter LIMIT might be MySQL specific I don't know (it doesn't say it is in the manual though) Update: I tried this on a large MySQL table (with 100ish columns and 500,000ish rows). Leaving out ORDER BY made only a very small difference. The first SQL statement was instant. The second took 8 seconds varying by $rand so maybe it is doing a full table scan here :-( | [reply] [d/l] [select] |
by elwarren (Priest) on Sep 05, 2000 at 02:48 UTC | |
| [reply] |
by Maclir (Curate) on Sep 05, 2000 at 00:57 UTC | |
| [reply] |
|
Re: Select a random record
by Maclir (Curate) on Sep 05, 2000 at 04:51 UTC | |
As I see it, there are two objectives; the first being able to select a random record, the second being able to do this in a relatively non-"anti-social" manner. You do not want to incur the wrath of your DBA, Sysadmin, IS manager, and so on. The most efficient way to retreive any sing record from a table is to use the "WHERE" clause on a primary key (or an alternate key where an index has been set up). Even a DBMS with an atrocious optimiser will do this without any difficulty. One method would be to add another column - say "sequence_number", make it NOT NULL and UNIQUE, and set it as an alternate key. However, you will have to make sure that as records are added, this column gets the next number in sequence. Plus, where records are deleted, what then happens . . . the maintenance hassle is too nasty to contemplate. Some DBMS's allow a special data type of "ROWID" or "SERIAL", which may allow you to have the system look after this for you. I know there is a ROWID with Informix, but I cannot find the right manual to explain if that is a proper sequence of integers. Besides, that is not ANSI standard SQL. Another option may be to use a cursor - but this is stretching my limited data base knowledge. Something to look at though. Ken | [reply] |
by sjaak (Novice) on Sep 08, 2000 at 09:07 UTC | |
Thanks for the idea, this is what I will use. The database does not exist yet and I am free to add as much columns as I like (I am the DBA). I will add a column in the table containing the serial_nr. Then, with a SELECT COUNT(*) FROM table and a SELECT record FROM table WHERE table.serial_nr = random_nr. If the record should appear to be deleted (not likely), then I re-select a record. greetings, sjaak | [reply] |
|
Re: Select a random record
by elwarren (Priest) on Sep 05, 2000 at 02:44 UTC | |
What you should do is find out which column on your table is the primary key. If there is no primary key then you will need a unique column. Use this column to build a list, you could stuff it in an array if it's not too large. Pick a random number and look it up via the array. Now run your select against the table to return this row. It will be worlds faster. | [reply] |
|
Re: Select a random record
by nop (Hermit) on Sep 05, 2000 at 05:10 UTC | |
As Ken suggested, create a row id. Keep track of the maximum rowid, call it M. To get a random row, choose a random integer in 1..M and then select the corresponding row. If you start deleting rows, then you'll have holes: the table will have N rows labeled 1..M, with N<M. In this case, choose your random int from 1..M and select the corresponding row. If there's no row there, just choose another random int and try again. This introduces no bias; the row eventually returned will be uniformly chosen from all the rows. If N/M is large (near 1, few holes), this approach works fine. When N/M degrades, you'll waste too much time shooting into holes. When N/M is unacceptably low (say less than .9), lock the table and reassign the rowids sequentially. | [reply] |
|
Re: Select a random record
by lhoward (Vicar) on Sep 05, 2000 at 05:10 UTC | |
Add a column to your table, I'm gonna call mine FOO in this example. Index FOO. You could do this with any type of value (integer, varchar, float/real, etc..). Assign every record currently in the table a "random" value within a range (say you decided that foo was a CHAR(10) you could assign it values from 'aaaaaaaaaa' to 'zzzzzzzzzz'). Whenevere you insert a new row into the DB assign it a random value in FOO as well. Now whenever you want a random record just generate a random seed (using the same range and algorithm you used to seed the table originally) then select * from BAR where FOO >= '$seed'. Don't fetch back all the rows (you could use LIMIT if your dialect of SQL supports it). Fetch them rows from the query back one-at-a-time while FOO has a constant value. If you got back only 1 row, then you have your match. If you got back multiple rows then pick randomly among them. If you do not pick an apropriate size for FOO and a valid randomizing function and your table grows much larger than expected this could become inefficient. But it should work good in some circumstances. | [reply] |
by nop (Hermit) on Sep 05, 2000 at 05:14 UTC | |
| [reply] |
|
Re: Select a random record
by Anonymous Monk on Sep 05, 2000 at 07:21 UTC | |
| [reply] |
|
Re: Select a random record
by jptxs (Curate) on Sep 05, 2000 at 05:20 UTC | |
UPDATE: on second thought, if you DB supports DISTINCT and ROWNUM you could do something like this: fname is a column i know will only have so many values with D in them, use a col like this in your own records. I get all of those, dictinctly to avoid duplicates, and then add that I only want one of the rows that result from this. if you use a number column and a rand number, you should be good to go. | [reply] [d/l] |