in reply to Sorting ip addresses quickly
The first thing I noticed was that you said: "Now we can use the numeric conversion of each ip address to compare it ...". This is in regards to your transformation from 1.2.3.4 form to 001002003004 form. However, you use cmp not <=> to compare your ips after this point. cmp is for strings of course, and <=> for numbers. I thought that maybe by simply changing this we could get a little speed up:
New code:
my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, int sprintf("%03.f%03.f%03.f%03.f", split(/\. +/, $_))] } @unsorted;
The two differences are that we are now forcing perl to compare the two ips as numbers not strings -- which should be faster, as a 12 digit binary number will only have 5 bytes to compare, whereas a 12 digit character string would have 12 bytes to compare. Also, concerning space, I added an int infront of the sprintf, in order to force perl to store the ip as a number, and not as a string.
The result? A slight speed down (6%), I didn't bother to measure the space taken, it should have simply been proportional. However the fact that it got slower shows that the cost difference between 5 comparisons and 12 was not worth the additional cost of converting the 12 character string into a 5 byte number. However it is good to know that it should use less space, which may be more important than 6% speed difference.
Results:
Now, the real kicker is, why use sprintf? The first warning is that you are using a number as a string for longer than you need to. The modified code below treats the octets as numbers as soon as they shoot out of the split statement. On top of this sprintf isn't going to be the fastest subroutine to call.Loading ips Loaded 1000 ips Benchmark: running Fastnum, Faststr, each for at least 10 CPU seconds. +.. Fastnum: 11 wallclock secs (11.28 usr + 0.00 sys = 11.28 CPU) @ 7 +.98/s (n=90) Faststr: 12 wallclock secs (11.34 usr + 0.00 sys = 11.34 CPU) @ 8 +.29/s (n=94)
The second thing is, why multiply by 1000? Ip's can only range up to 255, not 999. So just multiply by 256 (which can be optimized away as a bit-shift instead of a full blown integer).
Is the cost of string conversion worth the speed increase of removing the sprintf? Remember we are also substituting the sprintf with 4 multiplies. Of course we are also getting one byte back too, 4 bytes to store the ip instead of 5, so slightly faster to compare too (though as we can see the cost of converting can outway the time sorting).
So here is the newer code, and benchmark below it.
my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { my ($x,$y)=(0,$_); $x=$_ + $x * 256 for split(/\./, $y); [$y,$x]} @unsorted;
Tada, a 57% increase! Much much better. As you can see a subroutine call can make a big difference, they cost alot, plus you often don't get to know all the stuff that goes on inside.Loading ips Loaded 1000 ips Benchmark: running Faster, Faststr, each for at least 10 CPU seconds.. +. Faster: 11 wallclock secs (11.09 usr + 0.00 sys = 11.09 CPU) @ 12 +.98/s (n=144) Faststr: 11 wallclock secs (11.24 usr + 0.00 sys = 11.24 CPU) @ 8 +.27/s (n=93)
Well here is one final benchmark, all three running on a larger file. The main idea here is to put more weight on sorting side of the process than the transformation:
Aha! As you can see, now fastnum is now nearly the same speed as faststr. So the difference of saving 7 comps for the cost of string conversion eventually pays off. This is, as DrManhattan mentioned earlier, because you are paying the cost of your comps N*log(N) times, not just N. Also notice that our speedup with the new code is also less drastic now that we have 100 times the ips to sort.Loading ips Loaded 100000 ips Benchmark: timing 5 iterations of Faster, Fastnum, Faststr... Faster: 58 wallclock secs (57.90 usr + 0.21 sys = 58.11 CPU) Fastnum: 83 wallclock secs (82.57 usr + 0.38 sys = 82.95 CPU) Faststr: 83 wallclock secs (82.74 usr + 0.00 sys = 82.74 CPU)
Anyway, it's been fun,
Gryn
Updated: Fixed a small typo (thanks splinky)
|
---|
Replies are listed 'Best First'. | |
---|---|
RE: Sorting ip addresses quicker
by splinky (Hermit) on Jul 14, 2000 at 07:49 UTC | |
RE: Sorting ip addresses quicker
by DrManhattan (Chaplain) on Jul 21, 2000 at 23:27 UTC | |
by gryng (Hermit) on Jul 23, 2000 at 01:37 UTC |