Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks, your wisdom is required for I am severely lacking any on this Tuesday morning.
Simplest way to sort an array of domain names by their extentions?

Incorrect method : (simplified)
my @regexps=('\.co\.uk$','[\w\-]+\.com$','\.pl$','\.uk\.com$'); my @domains=qw(foo.com weirdext.za bar.uk.com blah.co.uk perl.pl zzzz. +co.uk); while(<@regexps>) { while(<@domains>) { if (/$regexp/i) { ......

Ideally I end up with a list of domains in extention order, and any that dont match get tagged on the end, in this case:

blah.co.uk zzzz.co.uk foo.com bar.uk.com perl.pl weirdext.za

Theory 1.003 alpha was to reverse each scalar and sort, but (for ex).coms and .uk.coms would spoil this.

Many thanks.
Paul Faulkner

Replies are listed 'Best First'.
Re: sorting domains by extention
by demerphq (Chancellor) on Aug 27, 2002 at 10:37 UTC
    Use a Schwartzian Transform (ST) or Guttman Rosler Transform (GRT)

    ST:

    my @domains=qw( blah.co.uk zzzz.co.uk foo.com bar.uk.com perl.pl weirdext.za google.de google.ca google.ru ); # ST : my @list=map { shift @$_ } sort { $a->[1] cmp $b->[1] || $a->[0] cmp $b->[0]} map { [ $_, m/(\..*)$/ ] } @domains; print join("\n",@list),"\n\n"; # GRT : (My preference for a variety of reasons, notably speed) @list=map {substr($_,index($_,"\0")+1)} sort map {join ("\0",m/(\..*)$/,$_) } @domains; print join("\n",@list),"\n\n"; # ST : With extra ordering criterion my %legal=map{ $_ => 1} qw(.co.uk .foo .com .edu); @list=map { shift @$_ } sort { $b->[2] <=> $a->[2] || $a->[1] cmp $b->[1] || $a->[0] cmp + $b->[0]} map { my ($ext)=m/(\..*)$/; [ $_, $ext, $legal{$ext} ] } @domains; print join("\n",@list),"\n\n"; __END__ Outputs: ---------- google.ca blah.co.uk zzzz.co.uk foo.com google.de perl.pl google.ru bar.uk.com weirdext.za google.ca blah.co.uk zzzz.co.uk foo.com google.de perl.pl google.ru bar.uk.com weirdext.za blah.co.uk zzzz.co.uk foo.com google.ca google.de perl.pl google.ru bar.uk.com weirdext.za
    Note I update this node with the extracriterion and a minor typo fix.

    Yves / DeMerphq
    ---
    Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)
    This was my Pentium Post! (686)

Re: sorting domains by extention
by Abigail-II (Bishop) on Aug 27, 2002 at 11:13 UTC
    You don't need a full sort, all you want is to put the domains in the proper buckets. Here's my solution:
    #!/usr/bin/perl use strict; use warnings 'all'; my @regexes = map {qr /\.$_$/} qr {co\.uk}, qr {com}, # No need for the [\w\-] prefix. qr {pl}, # qr {uk\.com}, # This one already gets grabbed by \.com ; my @domains = qw { foo.com weirdext.za bar.uk.com blah.co.uk perl.pl zzzz.co.uk }; my @buckets = map {[]} @regexes, 1; DOMAIN: foreach my $domain (@domains) { for (my $i = 0; $i < @regexes; $i ++) { next unless $domain =~ qr /$regexes[$i]/; push @{$buckets [$i]} => $domain; next DOMAIN; } push @{$buckets [-1]} => $domain; } print "$_\n" for map {@$_} @buckets; __END__ blah.co.uk zzzz.co.uk foo.com bar.uk.com perl.pl weirdext.za
    Abigail
Re: sorting domains by extention
by BrowserUk (Patriarch) on Aug 27, 2002 at 14:20 UTC

    My contribution, cos it had to be done:^). Probably not the most efficient solution, but simple.

    #! perl -w my @domains = qw(foo.com weirdext.za bar.uk.com blah.co.uk perl.pl zzz +z.co.uk); my @sorted = map{ join '.', reverse split( /\|/,$_ ) } sort map {join '|', reverse split( /\./,$_,2) } @domains; { local $"="\n"; print "@sorted"; } __END__ # Output C:\test>193114 blah.co.uk zzzz.co.uk foo.com perl.pl bar.uk.com weirdext.za C:\test>

    I know that the result is slightly different from your 'desired output' example, but I thought about this for a long time, and whilst I'm probably wrong as noone else has mentioned it, I can see no criteria by which bar.uk.com could be sort in the position you have it?

    If its grouped with foo.com, because they both have a .com extension, then bar.uk sorts before foo.

    If its after foo.com because .uk.com is lexically higher that .com, then .uk.com is also higher than .pl, which is what I think that you are asking for.


    What's this about a "crooked mitre"? I'm good at woodwork!
      Just thought I'd mention that this is essentially a form of GRT.

      ++

      Yves / DeMerphq
      ---
      Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

Re: sorting domains by extention
by Aristotle (Chancellor) on Aug 27, 2002 at 11:04 UTC
    In this specific case, you don't need a fullblown GRT. Of course the simpler approach is less efficient, but you won't notice that before you start sorting tenthousands of domains, and I find the simpler approach is tons more readable.
    my @sorted_domain = map { join ".", reverse split /\./ } sort map { join ".", reverse split /\./ } @domain;
    Argh. I'm not paying attention.

    Makeshifts last the longest.