in reply to Re^3: fastest way to compare numerical strings?
in thread fastest way to compare numerical strings?
The code below finds all the groupings of points within 0.01 degrees of each other from 300_000 pairs of coordinates in under 4 minutes (3:42) on my less-than-stellar 2.66GHz single processor box.
When it goes to plot the second point, it first checks the color of the pixel at the scaled coordinates. If this is still white, then this point does not come within 0.01 of any point plotted so far, and it is plotted in a color that corresponds to its position in the array.
However, if the pixel is some color other than white, this pair is within 0.01 of some point already plotted. In this case, the new circle is plotted in that color, forming a group of that color. It is also added to an array of points (keyed by that color), which effectively groups the cluster of points.
This process continues for all the points. If the color at the centre of its location is not white, then its circle is plotted in the color found there and it is added to the HoAs keyed by that color. If that pixel is white, it is plotted in a new, unique color to await being joined (or not) by a subsequent point.
In a single pass, all clusters are detected and can then be printed out or further processed.
The code below also outputs two images. The first contains circles drawn for every point. The second contains just the points that became grouped with at least one other point. With randomly generated data, there is not much difference between the images, but they can be seen if you blow the images up a bit and flip-flop between them.
It produces a file whereby the grouped indexes have been converted back to their corresponding coordinated and grouped. It looks like this (a small section):
[ 10.7685250292969 24.0294289818115 ] [ 10.764576366333 24.0289633201904 ] ------ [ 11.8467422927246 22.0280153343506 ] [ 11.8505748185425 22.0356987510986 ] ------ [ 10.3691293842163 22.0120664238281 ] [ 10.3718005385742 22.0104366081543 ] ------ [ 10.9817528293457 22.6283695793457 ] [ 10.974320052002 22.6306978874512 ] ------ [ 12.2627455496826 23.6263988487549 ] [ 12.2549643609009 23.62430337146 ] ------ [ 13.0046296383057 23.9849582969971 ] [ 13.0082298898315 23.9801852653809 ] ------ [ 12.3460158833618 23.0528201469727 ] [ 12.3482224891357 23.0609692253418 ] ------ [ 13.6142335176392 24.0877530998535 ] [ 13.619808100647 24.0818159141846 ] ------ [ 10.4141905968628 21.2074031425781 ] [ 10.4150035568848 21.2060061577148 ] ------ [ 11.0738495861206 23.2290730705566 ] [ 11.0661845344849 23.2240672081299 ] [ 11.0827921463623 23.2314013786621 ] ------ [ 10.8340263796387 21.6198629234619 ] [ 10.8307745395508 21.6281284172363 ] [ 10.8416914312744 21.6245195396729 ] ------
Further processing to do the date filtering is left as an exercise :)
The test code
#! perl -slw use strict; use Data::Dump qw[ pp ]; use List::Util qw[ reduce ]; use GD; use constant { WHITE => 16777215, MinX => 10.318842, MaxX => 14.124424, MinY => 21.097507, MaxY => 24.912207, }; use constant { SpanX => MaxX - MinX, SpanY => MaxY - MinY, }; sub rgb2n{ unpack 'N', pack 'CCCC', 0, @_ } sub n2rgb{ unpack 'xCCC', pack 'N', $_[0] } our $POINTS ||= 300_000; my @points = map [ MinX + rand( SpanX ), MinY + rand( SpanY ) ], 1 .. +$POINTS; ## Normalise to our grid, scale by 1000 and integerise my @scaled = map [ int( ( $_->[ 0 ] - MinX ) * 1000 ), int( ( $_->[ 1 ] - MinY ) * 1000 ), ], @points; my( $sizeX, $sizeY ) = ( int( SpanX * 1000 ), int( SpanY * 1000 ) ); my $img = GD::Image->new( $sizeX, $sizeY , 1 ); $img->filledRectangle( 0, 0, $sizeX, $sizeY, rgb2n( 255, 255, 255 ) ); my %groups; for my $i ( 0 .. $#scaled ) { my $color = $img->getPixel( @{ $scaled[ $i ] } ); if( $color != WHITE ) { push @{ $groups{ $color } }, $i; } else { $color = $i; } $img->filledEllipse( @{ $scaled[ $i ] }, 20, 20, $i ); } ## Save the image showing all points/circles open IMG, '>:raw', '694790all.png' or die $!; print IMG $img->png; close IMG; system 1, '694790all.png'; undef $img; ## Produce a file of clusters of points open OUT, '>', '694790.groups' or die $!; for my $key ( sort { $a <=> $b } keys %groups ) { print OUT "[ @{ $points[ $_ ] } ]" for $key, @{ $groups{ $key } }; print OUT "------"; } close OUT; ## Produce second image showing just the clustered points. $img = GD::Image->new( $sizeX, $sizeY , 1 ); $img->filledRectangle( 0, 0, $sizeX, $sizeY, rgb2n( 255, 255, 255 ) ); for my $key ( sort { $a <=> $b } keys %groups ) { $img->filledEllipse( @{ $scaled[ $key ] }, 20, 20, $key ); $img->filledEllipse( @{ $scaled[ $_ ] }, 20, 20, $key ) for @{ $gr +oups{ $key } }; } open IMG, '>:raw', '694790grouped.png' or die $!; print IMG $img->png; close IMG; system 1, '694790grouped.png'; __END__ Latitude stats: Minimum: 10.318842 Maximum: 14.124424 Mean: 11.24719 Standard Deviation: 0.805771 Longitude stats: Minimum: 21.097507 Maximum: 24.912207 Mean: 22.474358 Standard Deviation: 0.974835
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: fastest way to compare numerical strings? (300_000 points in under 4 minutes!)
by salva (Canon) on Jul 04, 2008 at 08:01 UTC | |
by BrowserUk (Patriarch) on Jul 04, 2008 at 08:33 UTC | |
by salva (Canon) on Jul 04, 2008 at 09:55 UTC | |
by BrowserUk (Patriarch) on Jul 05, 2008 at 07:01 UTC | |
by BrowserUk (Patriarch) on Jul 04, 2008 at 09:49 UTC |