in reply to Sorting URLs on domain/host: sortkeys generation
This should get close to what you want. It uses the GRT method of prepending the sort key to the string prior to sorting and stripping it afterwards.
To generate the sort key, it extracts the domain name from the url, splits it on '.' and reverses the order of the sub-components.
Ie. http://news.bbc.co.uk/something.htm
Becomes uk.co.bbc.newshttp://news.bbc.co.uk/something.htm
This will group .gov.uk together with .co.uk etc. It will also group numeric urls by their quads in the correct order, though they won't be in strictly numeric order as given. Adding code in the sort block to handle this wouldn't be too hard if its a requirement.
#! perl -slw use strict; open IN, '<links.dat' or die $!; chomp( my @links = <IN> ); close IN; @links = grep m[^\w+://], @links; # remove relative urls. my @sorted = map{ substr($_, 1+index($_, $;)); } sort map{ m[\w+://([^/]+)/]; join( '.', reverse split /\./, $1 ) . $; . $_; } @links; print for @sorted;
You might also consider adding the protocol to the end of the sort key if any of your urls are non-http links.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Sorting URLs on domain/host: sortkeys generation
by parv (Parson) on Mar 30, 2003 at 11:56 UTC | |
|
Re: Re: Sorting URLs on domain/host: sortkeys generation
by parv (Parson) on Mar 31, 2003 at 04:09 UTC | |
by BrowserUk (Patriarch) on Mar 31, 2003 at 06:37 UTC | |
by parv (Parson) on Mar 31, 2003 at 23:28 UTC | |
by parv (Parson) on Mar 31, 2003 at 04:31 UTC |