robharper has asked for the wisdom of the Perl Monks concerning the following question:
I have spotted a few related nodes through searching, but can't really find what I'm looking for, so...
I would like to scan a set of files for web URLs and recover just the (fully qualified) domain names, preferably without host names attached. Finding the URLs is no major problem, but the next step troubles me. I realise that there is little or no consistancy in how domains are arranged within CCTLDs, so some sort of database of rules would be needed to handle this fully.
Could someone please point me towards a module, program, or data set that might help me out here -- if such exists! If the worst comes, I could just strip the element before the first dot, which would probably do for most purposes, but is there a better way?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Stripping domain names from URLs
by tachyon (Chancellor) on Sep 08, 2004 at 13:39 UTC | |
by robharper (Pilgrim) on Sep 08, 2004 at 13:58 UTC | |
|
Re: Stripping domain names from URLs
by dragonchild (Archbishop) on Sep 08, 2004 at 12:46 UTC | |
by tachyon (Chancellor) on Sep 08, 2004 at 13:55 UTC | |
by dragonchild (Archbishop) on Sep 08, 2004 at 14:26 UTC | |
|
Re: Stripping domain names from URLs
by Fletch (Bishop) on Sep 08, 2004 at 12:51 UTC | |
|
Re: Stripping domain names from URLs
by Steve_p (Priest) on Sep 08, 2004 at 12:47 UTC | |
|
Re: Stripping domain names from URLs
by wfsp (Abbot) on Sep 08, 2004 at 12:55 UTC |