In order to reduce your dataset N to the required number of points M, you need to discard the (N-M) "least valuable" points.

Determination of value must entail two criteria:

  1. if a given point is the only point within some region of the graph, it is valuable.
  2. If a given point is "close" to another point, the least valuable is the one with the greatest uncertainty ($dy).

As there is no fixed ratio of N to M; nor is the distribution "even", the specification of "close" will need to evolve throughout the discard process.

One approach to this would be to consider the two closest points first and discard the one with the greatest uncertainty.

Then consider the next closest (now closest) two points, and again discard the least certain.

Continue until either the M target has been achieved; (or the closest pair are too far apart to be considered close).

The following code does this. The relevant part of it is the while( @ordered > $RETAIN ){ loop.

With most of the rest just plotting two (offset) graphs of the before (red) and after (green) data to allow me to visualise the results. On those graphs, the size of the circle around each point is the uncertainty (dy) value. In the after graph, the absence of any large circles shows the effectiveness of the run.

This is the result of using your generator to get 1000 points, which are then reduced to 50:

c:\test>868223-gen -N=1000 | 868223-plot -RETAIN=50

The code:


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

In reply to Re: Picking the best points by BrowserUk
in thread Picking the best points by kennethk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.