Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have three types of URL's:
http://www.url.com
http://www.url.com/
http://www.url.com/cgi.pl?x=y

I want to strip off the stuff that isn't the domain name. So the above URL's would become:
http://www.url.com or http://www.url.com/
* Note I want the trailing "/" if it's after the domain name

This is my stab at it:
#!/usr/bin/perl -w use strict; my $displayed_link = 'http://www.url.com/cgi.pl?x=y'; ( $displayed_link ) = $displayed_link =~ m/^(http:\/\/.*?\/)/; print $displayed_link . "\n";
Notice that I get the URL out, but if it's just a plaint http://www.url.com without a trailing slash it gets an error.
#!/usr/bin/perl -w use strict; my $displayed_link = 'http://www.url.com'; ( $displayed_link ) = $displayed_link =~ m/^(http:\/\/.*?\/)/; print $displayed_link . "\n";
I've tried a bunch of ways to have it come out with "http:www.url.com" or "http://www.url.com/" regardless of the input but can't figure it out.

What's the proper way to do this?

Replies are listed 'Best First'.
Re: Regex Question
by ikegami (Patriarch) on Jan 04, 2009 at 06:27 UTC

    Use URI.

    use strict; use warnings; use URI qw( ); for ( 'http://www.example.com/cgi.pl?x=y#anchor', 'http://www.example.com/cgi.pl?x=y', 'http://www.example.com/', 'http://www.example.com', ) { my $uri = URI->new_abs('/', $_); print("$uri\n"); }
    http://www.example.com/ http://www.example.com/ http://www.example.com/ http://www.example.com/
      Ahh... but I can't use the URI module as it's not installed and I can't get it installed.

        Then you have a broken Perl. It comes with Perl (No, only with ActivePerl)

        What problems are you having while trying to install it?

Re: Regex Question
by GrandFather (Saint) on Jan 04, 2009 at 07:04 UTC

    Using URI is most likely more reliable than rolling your own regex, but in the interests of improving your knowledge of regexen the following may be of help:

    use strict; use warnings; for (qw(http://www.url.com http://www.url.com/ http://www.url.com/cgi. +pl?x=y)) { my $displayed_link = $_; $displayed_link =~ s!^.*?(\w*://[^/]+/?).*!$1!; print $displayed_link, "\n"; }

    prints:

    http://www.url.com http://www.url.com/ http://www.url.com/

    Perl's payment curve coincides with its learning curve.
Re: Regex Question
by BrowserUk (Patriarch) on Jan 04, 2009 at 07:01 UTC

    Like this?

    @urls = qw[ http://www.url.com http://www.url.com/ http://www.url.com/cgi.pl?x=y ];; m[^( http:// [^/]+ (?: / | $ ) )]x and print "'$1'" for @urls;; 'http://www.url.com' 'http://www.url.com/' 'http://www.url.com/'

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Regex Question
by Anonymous Monk on Jan 04, 2009 at 06:24 UTC
Re: Regex Question
by Anonymous Monk on Jan 07, 2009 at 16:42 UTC
    1) The parentheses in the pattern control what is output 2) You haven't done anything to make the trailing slash optional
    I leave the rest of the homework assignment up to you.