in reply to Re^4: fastest way to compare numerical strings? (300_000 points in under 4 minutes!)
in thread fastest way to compare numerical strings?
For instance, with $scale = 1.0, the points (1.0, 0) (2.5, 0) (1.8, 0) will not be classified in the same cluster.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: fastest way to compare numerical strings? (Problem acknowledged)
by BrowserUk (Patriarch) on Jul 04, 2008 at 08:33 UTC | |
It took a few seconds to understand this. But yes, it is a problem. Thanks for bringing it to my attention. Whether the problem occurs depends on what order the points get plotted. Now the question is, is there a way of sorting the input to ensure that closer points get plotted before those further away (which would prevent it from occuring)? And what effect would sorting the input have upon the performance? An alternative fix would be to use some kind of AndOr or Xor mode when plotting; or perhaps something with the alpha channel, but I haven't thought that through yet. I'll need to think about it for a while. Have any potential fixes crossed your mind? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by salva (Canon) on Jul 04, 2008 at 09:55 UTC | |
Now the question is, is there a way of sorting the input to ensure that closer points get plotted before those further away (which would prevent it from occuring)?I don't think that can be solved just presorting the input. I'll need to think about it for a while. Have any potential fixes crossed your mind?Yes, I think that it can be done with the following changes: | [reply] |
by BrowserUk (Patriarch) on Jul 05, 2008 at 07:01 UTC | |
salva. I've fixed the problem. The solution I've chosen is to make two passes of the data. During the first pass, I draw each of the circle at scale (as before), but then reset the central pixel back to whatever color (background or previously draw circle color) was there before. The second pass operates exactly as it did previously. Because during the second pass, all the 'circles of influence' have already been drawn once, there is no ordering issue. By unsetting the central pixel during the first pass, it will only be colored if a subsequent circle overlays it. The effect is most clearly demonstrated by 'ficker comparing' between the two images produced when you run the modifed code below. (Performance wise, the code now processes the OPs original 300_000 point scenario, finding (for example) 127000 points in 93000 groups in under 2 minutes.) I suggest the following commandline options as giving the clearest demonstration: perl -slw 604750.pl -POINTS=100 -SCALER=800 -DELTA=100 What you will see is that in the image: 604790all.png, all the circles are plotted. Any circles whos central pixel is colored, are grouped. Those where the central pixel is white are not part of any group. In the second image 694790grouped.png, just the grouped circles are draw in outline. By flicker comparing the two images, (I believe that) you can clearly see that the algorithm is now operating correctly. I'd really value your confirmation (or otherwise)?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
Re^6: fastest way to compare numerical strings? (Potential solution)
by BrowserUk (Patriarch) on Jul 04, 2008 at 09:49 UTC | |
I think I've come up with a potential fix, though I need to sleep before I will have the tuits to code it. The mechanism I envisage is to plot the circles larger than scale. This will cause points that are further away than the specified limit to be grouped artificially. You then replot just the points in each group, at the correct scaling, on a clean canvas using alpha-blending, to eliminate the false hits. The first pass will reduce the problem to a few small, isolated groups making the second pass only need to deal with a few points at a time, so the slowdown shouldn't be significant. I'll try to post a corrected solution later. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |