All you need is a basic cmp sort (for domain name based URLs):

print for sort <DATA>; __DATA__ http://google.com http://google.com/groups http://google.com/groups/deeper http://msn.com http://msn.com/groups http://msn.com/groups/deeper http://apache.org http://apache.org/docs http://apache.org/docs/mod_perl

Which gives:

http://apache.org http://apache.org/docs http://apache.org/docs/mod_perl http://google.com http://google.com/groups http://google.com/groups/deeper http://msn.com http://msn.com/groups http://msn.com/groups/deeper

For numerical addresses you need to sort on 1) the integer representation of the 4 byte value that corresponds to the IP address then 2) the rest of the URL (if any). This is a little more complex and uses a Schwartzian transform for efficiency. I have assumed dot quads - it you have to deal with other stuff like "127.1" and all the other types of valid IPs Use Socket; my ($ip) = unpack "N", inet_aton($1) This will probably be a little slower than the raw unpack/pack/split presented.

my @data = qw( http://3.3.3.3/docs/mod_perl http://3.3.3.3/docs http://3.3.3.3 http://2.2.2.2 http://10.1.1.1 http://11.1.1.1 http://2.2.2.2/groups http://2.2.2.2/groups/deeper http://1.1.1.1/groups/deeper http://1.1.1.1/groups http://1.1.1.1 http://1.1.1.2 http://1.1.2.1 ); #use Socket; @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] || $a->[2] cmp $b->[2] } map { munge_url($_) } @data; print "$_\n" for @sorted; sub munge_url { my $addr = $_[0]; $addr =~ m!^(?:\w+://)?([^/]+)/?(.*)$!; # convert dot quad to a sortable integer my ($ip) = unpack 'N', pack 'C4', split '\.',$1; # or unpack 'N', +inet_aton($1); my $rest = $2 || ''; print "$ip $rest\n"; return [ $_, $ip, $rest ] } __DATA__ http://1.1.1.1 http://1.1.1.1/groups http://1.1.1.1/groups/deeper http://1.1.1.2 http://1.1.2.1 http://2.2.2.2 http://2.2.2.2/groups http://2.2.2.2/groups/deeper http://3.3.3.3 http://3.3.3.3/docs http://3.3.3.3/docs/mod_perl http://10.1.1.1 http://11.1.1.1

There is no logical relation between fqdns and dot quad IPs (sort wise) until you resolve the IPs to fqdns.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print


In reply to Re: Sorting URLs on domain/host: sortkeys generation by tachyon
in thread Sorting URLs on domain/host: sortkeys generation by parv

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.