in reply to Most efficient record selection method?
For example, we have 5 comma separated data file with 20 fields and we have been asked to show all variants contained in each of 3 of the fields over the 5 files in the least number of records.
Variants of what? Field values? Field value combinations? Variants relative to a canonical set of values? This isn't very clear to me. Do you mean:
Also, do you really want the smallest possible sampling of records or just to get rid of records for which all values have already been seen? Consider the sample, [A1,B1],[A1,B2],[A2,B1],[A2,B2]. If you only want to get rid of all records where all values have been seen then, assuming you don't care about combination, <code>[A1,B1],[A1,B2],[A2,B1] (a la CountZero's solution) is ok. If you really want the minimum number of records, then the right solution would be [A1,B1],[A2,B2] as pointed out by Kyle, and we've got a much more difficult optimization problem.
Best, beth
Update: - removed code sample due to failure to solve problem raised by Kyle and added a question about what the OP really means by "least".
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Most efficient record selection method?
by Kraythorne (Sexton) on Feb 13, 2009 at 12:09 UTC | |
by ELISHEVA (Prior) on Feb 13, 2009 at 13:48 UTC |