shaolin_gungfu has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks, Sorry to ask such a simple question, but I never was any good at regular expressions! :-)
Okay, I have a sub routine like so

# takes a url as input and returns the domain<br/> sub getDomain() my ($url) = shift; $url =~m#^(http://.+?/)(.*)#i; return $1; }

This simple function just takes an address such as http://www.perlmonks.com/index.pl?whatever and turns it into http://www.perlmonks.com/

It seems to work but I also want to strip out any possible port number which might be attached to the URL, i.e. http://www.perlmonks.org:8080/index.pl

Can anyone help me with this one?

Many thanks, Tom

Added code tags - dvergin 2002-04-29

Replies are listed 'Best First'.
Re: Regular Expression Question
by Juerd (Abbot) on Apr 29, 2002 at 19:13 UTC

    Don't use a regex for this. Use the appropriate tool, use URI.

    use URI; my $my_url = 'http://www.perlmonks.org:8080/index.pl?whatever'; my $uri = URI->new($my_url); my $host = $uri->host();
    Or, in one go (to illustrate that you do not need to name your objects if you don't need them to stick around):
    use URI; my $host = URI->new('http://www.perlmonks.org:8080/index.pl?whatever') +->host;

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

Re: Regular Expression Question
by thelenm (Vicar) on Apr 29, 2002 at 19:24 UTC
    Your problem may be solved if you don't try to match the entire URL, but only the part you're interested in, before any colons or slashes:
    sub getDomain { my ($url) = @_; $url =~ m#(http://[^/:]+)#i; return $1; }
    This doesn't have exactly the same functionality as your function (specifically, it chops the trailing slash), but that's easy to modify if you want. You may also want to look into the Perl module called "URI".

    Update: changed return $url; to return $1;.

      thanks, I tried using the URI thing like everyone said, but got some errors with the RobotUA module - so went for the regexp version which seems to be working, cheers!
Re: Regular Expression Question
by boo_radley (Parson) on Apr 29, 2002 at 19:19 UTC
    there's a good non-regex way (one of many, I'm sure) :
    use URI::URL; $url1 = new URI::URL 'http://perlmonks.org/index.pl?node=boo_radley' | +| die; print join "\n",$url1->crack();
    see URI::URL for more