in reply to regex to extract fully-qualified domain name from full URL

tadman is right about using modules, but just thinking about the regexp, I think
^(http://[^/]*)
would do what you want, working from your above example...

my $foo = "http://www.hostname.co.uk/foo/bar"; print $foo =~ m#^(http://[^/]*)#;
outputs http://www.hostname.co.uk - and it should work ok with IP addresses and the http://username:password@hostname/ format, etc

andy.

Replies are listed 'Best First'.
Re^2: regex to extract fully-qualified domain name from full URL
by tadman (Prior) on Mar 22, 2001 at 23:45 UTC
    To be more thorough, perhaps:    ^(https?://[^/]*) Or further:    ^((?:https?|mailto)://[^/]*) You should also hope that your 'username' and 'password' do not contain any slashes. The only restriction would appear to be that the username cannot contain a ':', and the password cannot contain an '@', though this could be browser dependent.

      ooo, you're quite right - https completely slipped my mind.
      I'd have to agree with
      ^(https?://[^/]*)
      But frankly, if you're going to include mailto, I think by rights all the other (multifarious) possibilities ought to match as well... in which case it really is time to reach for a module, as you initially suggested.

      I disagree with you about possible slashes, ats and other funny characters in the username and password though - my (cursory) examination of the RFCs indicates they're both 'unsafe' and 'reserved' - and it says... Within the user and password field, any ":", "@", or "/" must be encoded RFC1738 - not sure this is still the current one though (?). And this seems to make sense, given the slash is a delimiter within the URL.

      andy.

      looking into it further... RFC1738 superceded by RFC2396... but I need to go and do some Real Work... ;)