in reply to Picking the best points
In order to reduce your dataset N to the required number of points M, you need to discard the (N-M) "least valuable" points.
Determination of value must entail two criteria:
As there is no fixed ratio of N to M; nor is the distribution "even", the specification of "close" will need to evolve throughout the discard process.
One approach to this would be to consider the two closest points first and discard the one with the greatest uncertainty.
Then consider the next closest (now closest) two points, and again discard the least certain.
Continue until either the M target has been achieved; (or the closest pair are too far apart to be considered close).
The following code does this. The relevant part of it is the while( @ordered > $RETAIN ){ loop.
With most of the rest just plotting two (offset) graphs of the before (red) and after (green) data to allow me to visualise the results. On those graphs, the size of the circle around each point is the uncertainty (dy) value. In the after graph, the absence of any large circles shows the effectiveness of the run.
This is the result of using your generator to get 1000 points, which are then reduced to 50:
c:\test>868223-gen -N=1000 | 868223-plot -RETAIN=50
The code:
|
|---|