The actual datasets are actually quite lumpy - the source data are thermophysical property measurements, so for example, sets tend to have a very large number of points near 25 C. I like the bucket oriented process, though I am not opposed to keeping points that are near each other. Frequently points that are proximate in space and have low reported uncertainties may still disagree with each other, which is why I'd like to still keep a fairly large number of points. Preliminarily, I'm favoring
's suggestion, though I might use buckets as an initial pass to guarantee good spatial coverage depending on some empirical testing. And in any case, your result looks better than mine.