in reply to fastest way to compare numerical strings?
For the following code there is only one pass through the input array at the beginning and then sorts of small arrays ("small" in case the assumption made at the beginning holds).
Output:#maximum distance we are looking for $delta = 0.01; #test array @a = (1.02, 1.03, 6.01, 9, 1.04, 1.011, 1.025, 1.01, 0.005, -0.002); #"discretize" points to neighboring points, scale by 1/$delta #to simplify computation for (@a) { push @{$h{int($_/$delta)}}, $_; push @{$h{int($_/$delta-1)}}, $_; push @{$h{int($_/$delta+1)}}, $_; } #handle clusters for (keys %h) { #in case the corresponding array has more than one element, #we know that it contains at least one pair not #further apart than 2 * $delta, otherwise ignore it if (@{$h{$_}} > 1) { @sorted = sort @{$h{$_}}; for (0..@sorted-2) { $r = $sorted[$_]; $s = $sorted[$_+1]; #filter out neighboring pairs, since we do not need to #process the numbers further, we cram them into a #string for the final output $near{"$r, $s"} = 1 if ($r < $s && $s <= $r + $delta); } } } print "Not further than $delta apart are the following pairs:\n"; print "$_\n" for (keys %near);
Update: Tested using the Benchmark module and a fixed array of 300000 numbers randomly distributed between 0 and 300, the near number determining part took about 8 seconds on my reasonably modern machine.Not further than 0.01 apart are the following pairs: -0.002, 0.005 1.02, 1.025 1.025, 1.03 1.011, 1.02 1.03, 1.04 1.01, 1.011
I.e. the benchmark test starts like this:
Update: Improved the description, it gave the impression we first look for all numbers in the array not more than 2 * $delta apart, which is not the case.use strict; use warnings; use Benchmark; #maximum distance we are looking for my $delta = 0.01; #test array my @a; for (1..300000) { push @a, rand() * 300; } my ($r, $s); timethis ( 10 => sub { ...
|
|---|