I recently set out to sort some ip addresses. Here's what I came up with:

First the straightforward method:

sub by_ip { # Split the two ip addresses up into octets my ($a1, $a2, $a3, $a4) = split /\./, $a; my ($b1, $b2, $b3, $b4) = split /\./, $b; # Check to see if the first octets are the same if ($a1 == $b1) { # If the first octets are the same, check # the second octets if ($a2 == $b2) { # Ditto for the third octets if ($a3 == $b3) { # If the first 3 octets # of each address are # the same, return the # comparison of the last # octet $a4 =~ s/^([0-9]+)/$1/; $b4 =~ s/^([0-9]+)/$1/; return $a4 <=> $b4; } else { # 3rd octets were different # so return their comparison return $a3 <=> $b3; } } else { # 2nd octets were different so # return their comparison return $a2 <=> $b2; } } else { # Best case: The first octets were # different so we can return their # comparison immediately return $a1 <=> $b1; } } my @sorted = sort by_ip @unsorted;

This works, but it's slow because we have to split both ip addresses and probably make several equality checks for each address comparison. So can we get away with only mangling each ip address once during the whole operation and only making one numeric comparison? Yes, Schwartzian Transform to the rescue:

my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, sprintf("%03.f%03.f%03.f%03.f", split(/\./, $ +_))] } @unsorted;

Now what's this doing? Let's split it up into three separate statements:

@mapped = map { [$_, sprintf("%03.f%03.f%03.f%03.f", split(/\./, $_))] + } @unsorted; @sorted = sort { $a->[1] cmp $b->[1] } @mapped; @sorted = map { $_->[0] } @sorted;
  1. The first map command takes the list of ip addresses as input and gives us back a new array of arrayrefs. Now each array element in @mapped looks like [ "1.2.3.4", "001002003004" ]. Now we can use the numeric conversion of each ip address to compare it directly with another.
  2. The sort command sorts the members of @mapped by comparing the second element of each (e.g. 001002003004), and leaves us with a sorted array of arrayrefs in @sorted.
  3. The final map command extracts the first element of each member of the sorted array (e.g. "1.2.3.4"), and leaves us with a sorted list of ip addresses.

So why is it faster to use the transform? Well, even in the best case when the first octet of each ip address to be sorted is different, &by_ip() has to make two split()s and two numeric comparisons every time it's called. Since Perl's sort() uses a quicksort algorithm, it runs in O(N*log(N)) time. That makes 2N*log(N) split()s and 2N*log(N) numeric comparisons, where N is the number of addresses in the unsorted list. The transform, on the other hand, only split()s each address once and only makes one numeric comparison for each address comparison. N split()s and N*log(N) comparisons is a big win over 2N*log(N) split()s and 2N*log(N) comparisons.

- Matt


In reply to Sorting ip addresses quickly by DrManhattan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.