Re: Sorting IP addresses, lots of them, quickly

Replies are listed 'Best First'.
Re^2: Sorting IP addresses, lots of them, quickly by salva (Canon) on May 16, 2007 at 16:30 UTC
If only perl had an in-place sort option... Actually it has, `@in = sort @in;` is optimized as an in-place sort!	[reply] [d/l]
Re^3: Sorting IP addresses, lots of them, quickly by jdporter (Paladin) on May 16, 2007 at 18:58 UTC
Do you have any documentation for that? The sort manpage doesn't mention any such thing. A word spoken in Mind will reach its own level, in the objective world, by its own weight	[reply]
Re^4: Sorting IP addresses, lots of them, quickly by salva (Canon) on May 16, 2007 at 19:24 UTC
AFAIK, it is not officially documented in any place, but you can see it's true dumping the OP tree: salva@vespa:/tmp$ perl -MO=Concise -e '@i = sort @j' c <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 b <2> aassign[t5] vKS ->c - <1> ex-list lK ->8 3 <0> pushmark s ->4 7 <@> sort lK ->8 4 <0> pushmark s ->5 6 <1> rv2av[t4] lK/1 ->7 5 <#> gv[j] s ->6 - <1> ex-list lK ->b 8 <0> pushmark s ->9 a <1> rv2av[t2] lKRM/1 ->b 9 <#> gv[i] s ->a -e syntax OK salva@vespa:/tmp$ perl -MO=Concise -e '@i = sort @i' 8 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 - <1> ex-aassign vKS/64 ->8 - <1> ex-list lK ->- 3 <0> pushmark s ->4 7 <@> sort lK/INPLACE ->8 4 <0> pushmark s ->5 6 <1> rv2av[t4] lKRM/1 ->7 5 <#> gv[i] s ->6 - <1> ex-list lK ->- - <0> ex-pushmark s ->- - <1> ex-rv2av lKRM/1 ->- - <0> ex-gv s ->- -e syntax OK [download]	[reply] [d/l]
Re^4: Sorting IP addresses, lots of them, quickly by jwkrahn (Abbot) on May 16, 2007 at 20:12 UTC
See: perl584delta	[reply]
Re^2: Sorting IP addresses, lots of them, quickly by gloryhack (Deacon) on May 16, 2007 at 20:05 UTC
I'm seeing what appears to be a hardware dependence, too. On an old dual P-II/450 with 1G of RAM, I get (with my original 11818 address list): `Rate ukeysort grt schwartzian gloryhackish + keysort ukeysort 1.63/s -- -20% -21% -39% + -70% grt 2.04/s 25% -- -0% -23% + -62% schwartzian 2.05/s 26% 0% -- -23% + -62% gloryhackish 2.66/s 63% 30% 30% -- + -51% keysort 5.41/s 232% 165% 163% 103% + --` [download] Somehow, ukeysort went to the back of the line, with GRT not being a whole lot zippier and inconsequentially slower than ST. On a similar machine with just a single P-II/500 and 640M of RAM, though: `Rate schwartzian ukeysort grt gloryhackish + keysort schwartzian 2.28/s -- -22% -27% -42% + -71% ukeysort 2.92/s 28% -- -7% -25% + -63% grt 3.14/s 38% 8% -- -20% + -60% gloryhackish 3.91/s 72% 34% 25% -- + -50% keysort 7.86/s 245% 169% 151% 101% + --` [download] GRT beats ukeysort again and is well ahead of ST. But on an Athlon 3900+ w/1G RAM: `Rate schwartzian grt ukeysort gloryhackish + keysort schwartzian 6.83/s -- -37% -46% -52% + -76% grt 10.9/s 59% -- -13% -24% + -61% ukeysort 12.6/s 84% 16% -- -12% + -55% gloryhackish 14.2/s 108% 31% 13% -- + -50% keysort 28.2/s 312% 159% 124% 98% + --` [download] None of these machines are memory constrained, none are swapping, and none are particularly busy doing other things. All are running the same version of Debian with the stock kernels, running the exact same test code just copied from machine to machine. keysort is always out front, but unlike my AMD64 the gloryhackish stays in the #2 spot every time, while the others wiggle around depending upon the machine used. 'Tis a puzzle, seems to point toward something at a very low level being the source of the disparate numbers. The only conclusion I can come up with is that keysort is the way to go when a large number of IP addresses must be sorted in a time constrained environment, gloryhackish might be a good choice if no non-core module dependencies are allowed, even though it might not always be the second best choice. It's good to know about Sort::Key, in any event.	[reply] [d/l] [select]
Re^3: Sorting IP addresses, lots of them, quickly by salva (Canon) on May 16, 2007 at 20:58 UTC
The ukeysort variant is slower than the others because converting from the IPv4 dot notation to an unsigned int in Perl is much, much, much, slower than calling inet_aton. If an XSUB were used to make the conversion ukeysort would be the fastest! But anyway, as the data set size increases, the conversion cost from IPv4 to uint would become less decisive than the comparison function used inside the sort, and at some point the ukeysort solution would become the fastest of all the alternatives (and BTW, it also uses less memory! :-)	[reply]
Re^4: Sorting IP addresses, lots of them, quickly by salva (Canon) on May 16, 2007 at 23:12 UTC
If an XSUB were used to make the conversion ukeysort would be the fastest! And to show it, I have just uploaded Sort::Key::IPv4 to CPAN ;-) Now, adding this test to the benchmarks... `use Sort::Key::IPv4 qw(ipv4sort); sub ipv4ks { my @sorted = ipv4sort @address; }` [download] I get for 10_000 addresses: `Rate gloryhackish ukeysort keysort ipv4sor +t gloryhackish 6.14/s -- -31% -54% -79 +% ukeysort 8.91/s 45% -- -34% -70 +% keysort 13.5/s 119% 51% -- -54 +% ipv4sort 29.4/s 378% 229% 118% - +-` [download] For 100_000: `s/iter gloryhackish ukeysort keysort ipv4sor +t gloryhackish 1.80 -- -31% -33% -78 +% ukeysort 1.25 44% -- -3% -68 +% keysort 1.21 49% 3% -- -67 +% ipv4sort 0.400 350% 212% 202% - +-` [download] And for 1_000_000: `s/iter gloryhackish keysort ukeysort ipv4sor +t gloryhackish 24.3 -- -29% -46% -79 +% keysort 17.2 41% -- -24% -71 +% ukeysort 13.2 85% 31% -- -62 +% ipv4sort 4.99 388% 245% 164% - +-` [download]	[reply] [d/l] [select]
Re^5: Sorting IP addresses, lots of them, quickly by gloryhack (Deacon) on May 17, 2007 at 03:57 UTC
Re^5: Sorting IP addresses, lots of them, quickly by monarch (Priest) on May 17, 2007 at 06:36 UTC
Re^6: Sorting IP addresses, lots of them, quickly by salva (Canon) on May 21, 2007 at 17:31 UTC
Re^6: Sorting IP addresses, lots of them, quickly by salva (Canon) on May 17, 2007 at 06:53 UTC