http://qs1969.pair.com?node_id=578383

abachus has asked for the wisdom of the Perl Monks concerning the following question:

Good day all,

I would like to extract a domain name from an arbitrary url. A nice easy example :


From -> http://yahoo.co.uk/notfound.html
extract only -> yahoo.co.uk

The data will be coming from a UserAgent request header, so i will need to grab the GET *url* HTTP/1.1 line and work from there. Any thoughts on the best way to do this ?



many thanks,

Isaac Close.

Replies are listed 'Best First'.
Re: Extracting a domain name from a url
by rhesa (Vicar) on Oct 15, 2006 at 14:30 UTC
    Check out URI.
    use URI; my $uri = URI->new( 'http://yahoo.co.uk/notfound.html' ); print $uri->host;
Re: Extracting a domain name from a url
by blazar (Canon) on Oct 15, 2006 at 15:27 UTC
    From -> http://yahoo.co.uk/notfound.html
    extract only -> yahoo.co.uk

    rhesa already suggested using a specialized module, which yields a superior solution, which is superior because it uses a a specialized module (and that's generally the case), but this shouldn't be hard to do with a match or split, in which case some familiarity with elementary regexen should help you. In particular the following should be fine for you:

    my $url='http://yahoo.co.uk/notfound.html'; my $host=(split m(/+), $url)[1];
      Yup, your solution does work for the majority of urls.

      For the record, here are some urls that wouldn't be handled properly with your regex:

      http://proxy.aol.com:8080/ http://user:pass@yahoo.com/login
        thanks monks, as always you show light on things i cannot see :)
        This one is one I've been using in production for a while, and seems to hold up well:

        my $url="http://proxy.aol.com:8080/login"; my($host)=$url=~/http:\/\/([^\/]+)/;

        word!
        -Ev

        Update: Sorry, I'm a moe - I forgot to add the point of this post in my example!

        Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Re: Extracting a domain name from a url
by fenLisesi (Priest) on Oct 16, 2006 at 09:03 UTC
    When you match your url against $RE{URI}{HTTP}{-keep}, what you want will be in $3, which you should extract immediately after the match. This sample code: prints:
    http://yahoo.co.uk/notfound.html => yahoo.co.uk http://yahoo.co.uk:80/notfound.html?fn=john&ln=doe => yahoo.co.uk http://yahoo.co.uk/ => yahoo.co.uk http://yahoo.co.uk => yahoo.co.uk yahoo.co.uk/notfound.html => No match yahoo.co.uk => No match potato://yahoo.co.uk => No match http://yahoo.co.uk:80/notfound.html?fn=john&ln=doe => http yahoo.co.uk 80 /notfound.html?fn=john&ln=doe notfound.html?fn=john&ln=doe notfound.html fn=john&ln=doe
    See Regexp::Common and Regexp::Common::URI::http.

    Update: Added <readmore>