in reply to Picking the best points

In order to reduce your dataset N to the required number of points M, you need to discard the (N-M) "least valuable" points.

Determination of value must entail two criteria:

  1. if a given point is the only point within some region of the graph, it is valuable.
  2. If a given point is "close" to another point, the least valuable is the one with the greatest uncertainty ($dy).

As there is no fixed ratio of N to M; nor is the distribution "even", the specification of "close" will need to evolve throughout the discard process.

One approach to this would be to consider the two closest points first and discard the one with the greatest uncertainty.

Then consider the next closest (now closest) two points, and again discard the least certain.

Continue until either the M target has been achieved; (or the closest pair are too far apart to be considered close).

The following code does this. The relevant part of it is the while( @ordered > $RETAIN ){ loop.

With most of the rest just plotting two (offset) graphs of the before (red) and after (green) data to allow me to visualise the results. On those graphs, the size of the circle around each point is the uncertainty (dy) value. In the after graph, the absence of any large circles shows the effectiveness of the run.

This is the result of using your generator to get 1000 points, which are then reduced to 50:

c:\test>868223-gen -N=1000 | 868223-plot -RETAIN=50

The code:


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy