in reply to Creating a sub-list constrained by an average while maintaining maximum number of original elements

It is also advantageous, in some cases (say, when dealing with “hundreds of thousands of files”) to do one of two things:

Sorting is an “unexpectedly fast and efficient” approach to many things.   (“Sorting and Searching” is of course the title of one of Dr. Knuth’s definitive textbooks.)  

You need to ponder the trade-off between “goodness of fit” and “diversity.”   Which of these two considerations is the more important?   It might well be argued that the algorithm should consist of a single pass through the randomized file, always adding the new record to the pool and throwing-out a randomly selected element (say, based on the simple criteria of it being either “larger than” or “smaller than” the target value) with each iteration.

It might be prudent to model several different candidate algorithms, and then choose among them based upon both your understanding of the problem and their observed statistical characteristics.