comment on

You are indeed correct DrManhattan however a few small things that I'd like to add to make this code even faster. (Though, I'll readily admit, probably not the fastest!)

The first thing I noticed was that you said: "Now we can use the numeric conversion of each ip address to compare it ...". This is in regards to your transformation from 1.2.3.4 form to 001002003004 form. However, you use cmp not <=> to compare your ips after this point. cmp is for strings of course, and <=> for numbers. I thought that maybe by simply changing this we could get a little speed up:

New code:

my @sorted = map  { $_->[0] }
             sort { $a->[1] <=> $b->[1] }
             map  { [$_, int sprintf("%03.f%03.f%03.f%03.f", split(/\.
+/, $_))] }
             @unsorted;
[download]

The two differences are that we are now forcing perl to compare the two ips as numbers not strings -- which should be faster, as a 12 digit binary number will only have 5 bytes to compare, whereas a 12 digit character string would have 12 bytes to compare. Also, concerning space, I added an int infront of the sprintf, in order to force perl to store the ip as a number, and not as a string.
The result? A slight speed down (6%), I didn't bother to measure the space taken, it should have simply been proportional. However the fact that it got slower shows that the cost difference between 5 comparisons and 12 was not worth the additional cost of converting the 12 character string into a 5 byte number. However it is good to know that it should use less space, which may be more important than 6% speed difference.

Results:

Loading ips
Loaded 1000 ips
Benchmark: running Fastnum, Faststr, each for at least 10 CPU seconds.
+..
   Fastnum: 11 wallclock secs (11.28 usr +  0.00 sys = 11.28 CPU) @  7
+.98/s (n=90)
   Faststr: 12 wallclock secs (11.34 usr +  0.00 sys = 11.34 CPU) @  8
+.29/s (n=94)
[download]

Now, the real kicker is, why use sprintf? The first warning is that you are using a number as a string for longer than you need to. The modified code below treats the octets as numbers as soon as they shoot out of the split statement. On top of this sprintf isn't going to be the fastest subroutine to call.

The second thing is, why multiply by 1000? Ip's can only range up to 255, not 999. So just multiply by 256 (which can be optimized away as a bit-shift instead of a full blown integer).

Is the cost of string conversion worth the speed increase of removing the sprintf? Remember we are also substituting the sprintf with 4 multiplies. Of course we are also getting one byte back too, 4 bytes to store the ip instead of 5, so slightly faster to compare too (though as we can see the cost of converting can outway the time sorting).

So here is the newer code, and benchmark below it.

my @sorted = map { $_->[0] }
             sort { $a->[1] <=> $b->[1] }
             map {      my ($x,$y)=(0,$_);
                        $x=$_ + $x * 256 for split(/\./, $y);
                        [$y,$x]}
             @unsorted;
[download]

Loading ips
Loaded 1000 ips
Benchmark: running Faster, Faststr, each for at least 10 CPU seconds..
+.
    Faster: 11 wallclock secs (11.09 usr +  0.00 sys = 11.09 CPU) @ 12
+.98/s (n=144)
   Faststr: 11 wallclock secs (11.24 usr +  0.00 sys = 11.24 CPU) @  8
+.27/s (n=93)
[download]

Tada, a 57% increase! Much much better. As you can see a subroutine call can make a big difference, they cost alot, plus you often don't get to know all the stuff that goes on inside.

Well here is one final benchmark, all three running on a larger file. The main idea here is to put more weight on sorting side of the process than the transformation:

Loading ips
Loaded 100000 ips
Benchmark: timing 5 iterations of Faster, Fastnum, Faststr...
    Faster: 58 wallclock secs (57.90 usr +  0.21 sys = 58.11 CPU)
   Fastnum: 83 wallclock secs (82.57 usr +  0.38 sys = 82.95 CPU)
   Faststr: 83 wallclock secs (82.74 usr +  0.00 sys = 82.74 CPU)
[download]

Aha! As you can see, now fastnum is now nearly the same speed as faststr. So the difference of saving 7 comps for the cost of string conversion eventually pays off. This is, as DrManhattan mentioned earlier, because you are paying the cost of your comps N*log(N) times, not just N. Also notice that our speedup with the new code is also less drastic now that we have 100 times the ips to sort.

Anyway, it's been fun,
Gryn

Updated: Fixed a small typo (thanks splinky)

In reply to Sorting ip addresses quicker by gryng
in thread Sorting ip addresses quickly by DrManhattan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.