Extracting a domain name from a url

abachus has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extracting a domain name from a url by rhesa (Vicar) on Oct 15, 2006 at 14:30 UTC
Check out URI. `use URI; my $uri = URI->new( 'http://yahoo.co.uk/notfound.html' ); print $uri->host;` [download]	[reply] [d/l]
Re: Extracting a domain name from a url by blazar (Canon) on Oct 15, 2006 at 15:27 UTC
From -> http://yahoo.co.uk/notfound.html extract only -> yahoo.co.uk rhesa already suggested using a specialized module, which yields a superior solution, which is superior because it uses a a specialized module (and that's generally the case), but this shouldn't be hard to do with a match or split, in which case some familiarity with elementary regexen should help you. In particular the following should be fine for you: `my $url='http://yahoo.co.uk/notfound.html'; my $host=(split m(/+), $url)[1];` [download]	[reply] [d/l]
Re^2: Extracting a domain name from a url by rhesa (Vicar) on Oct 15, 2006 at 18:26 UTC
Yup, your solution does work for the majority of urls. For the record, here are some urls that wouldn't be handled properly with your regex: `http://proxy.aol.com:8080/ http://user:pass@yahoo.com/login` [download]	[reply] [d/l]
Re^3: Extracting a domain name from a url by abachus (Monk) on Oct 15, 2006 at 23:02 UTC
thanks monks, as always you show light on things i cannot see :)	[reply]
Re^3: Extracting a domain name from a url by ministry (Scribe) on Oct 16, 2006 at 01:10 UTC
This one is one I've been using in production for a while, and seems to hold up well: `my $url="http://proxy.aol.com:8080/login"; my($host)=$url=~/http:\/\/([^\/]+)/;` [download] word! -Ev Update: Sorry, I'm a moe - I forgot to add the point of this post in my example! Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.	[reply] [d/l]
Re^4: Extracting a domain name from a url by ikegami (Patriarch) on Oct 16, 2006 at 01:49 UTC
Re: Extracting a domain name from a url by fenLisesi (Priest) on Oct 16, 2006 at 09:03 UTC
When you match your url against `$RE{URI}{HTTP}{-keep}`, what you want will be in `$3`, which you should extract immediately after the match. This sample code: Read more... (1332 Bytes) prints: `http://yahoo.co.uk/notfound.html => yahoo.co.uk http://yahoo.co.uk:80/notfound.html?fn=john&ln=doe => yahoo.co.uk http://yahoo.co.uk/ => yahoo.co.uk http://yahoo.co.uk => yahoo.co.uk yahoo.co.uk/notfound.html => No match yahoo.co.uk => No match potato://yahoo.co.uk => No match http://yahoo.co.uk:80/notfound.html?fn=john&ln=doe => http yahoo.co.uk 80 /notfound.html?fn=john&ln=doe notfound.html?fn=john&ln=doe notfound.html fn=john&ln=doe` [download] See Regexp::Common and Regexp::Common::URI::http. Update: Added <readmore>	[reply] [d/l] [select]


No such thing as a small change
	PerlMonks