srdst13 has asked for the wisdom of the Perl Monks concerning the following question:
I have a set of data that I want to do a particular operation on to consolidate from a larger set to a smaller, cleaner set. The data consists of a set of points on a line, with each point having a "quality" associated with it. An interval is simply the distance between two adjacent points. I would like to start with a large number of such points (that I generate) and consolidate them as follows:
The goal is to produce the highest quality set of points at the most uniform spacing possible. My question, now, is pretty vague. I can pretty easily figure A way to do this, but I will have perhaps a million points, so doing things like sorting after each iteration will probably be prohibitive. Does anyone have any clever data structures that might be beneficial here?
I am trying a heap for managing the intervals (so that I can quickly choose the shortest interval). I am using two basic objects, a "point" and an "interval". The interval has an associated distance along with references to each of the points that form the interval. The "point" object has a quality and a location as well as a reference to each of the one (at the ends of the line) or two intervals. So, I choose the shortest interval, choose the worse of the two points in that interval in terms of quality, and then do some reference swapping to remove the old interval and point. Finally, one of the adjacent intervals is adjusted to bridge the gap left by removing the point and distance is recalculated. Heap::Simple::XS allows one to use a method call to arrange to get the top interval (shortest) from the heap, so at the expense of a bit of efficiency, simply altering the affected interval suffices to have it end up in the correct location on the heap.
When I get the code completed, I'll flesh out some details here, but with simple mock data, 1e6 intervals is quite doable in a matter of minutes, which is acceptible.
Thanks to all for the various suggestions.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Points on a line and associated intervals
by BrowserUk (Patriarch) on May 18, 2006 at 03:43 UTC | |
by srdst13 (Pilgrim) on May 18, 2006 at 10:56 UTC | |
by BrowserUk (Patriarch) on May 18, 2006 at 12:28 UTC | |
|
Re: Points on a line and associated intervals
by BrowserUk (Patriarch) on May 18, 2006 at 06:17 UTC | |
|
Re: Points on a line and associated intervals (metrics)
by tye (Sage) on May 18, 2006 at 07:02 UTC | |
|
Re: Points on a line and associated intervals
by TedPride (Priest) on May 18, 2006 at 03:05 UTC | |
|
Re: Points on a line and associated intervals
by turo (Friar) on May 18, 2006 at 01:56 UTC | |
|
Re: Points on a line and associated intervals
by GrandFather (Saint) on May 18, 2006 at 01:56 UTC |