Regex Question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have three types of URL's:
http://www.url.com
http://www.url.com/
http://www.url.com/cgi.pl?x=y

I want to strip off the stuff that isn't the domain name. So the above URL's would become:
http://www.url.com or http://www.url.com/
* Note I want the trailing "/" if it's after the domain name

This is my stab at it:

#!/usr/bin/perl -w
use strict;

my $displayed_link = 'http://www.url.com/cgi.pl?x=y';

( $displayed_link ) = $displayed_link =~ m/^(http:\/\/.*?\/)/;

print $displayed_link . "\n";
[download]

Notice that I get the URL out, but if it's just a plaint http://www.url.com without a trailing slash it gets an error.

#!/usr/bin/perl -w
use strict;

my $displayed_link = 'http://www.url.com';

( $displayed_link ) = $displayed_link =~ m/^(http:\/\/.*?\/)/;

print $displayed_link . "\n";
[download]

I've tried a bunch of ways to have it come out with "http:www.url.com" or "http://www.url.com/" regardless of the input but can't figure it out.

What's the proper way to do this?

Comment on Regex Question Select or Download Code

Replies are listed 'Best First'.
Re: Regex Question by ikegami (Patriarch) on Jan 04, 2009 at 06:27 UTC
Use URI. `use strict; use warnings; use URI qw( ); for ( 'http://www.example.com/cgi.pl?x=y#anchor', 'http://www.example.com/cgi.pl?x=y', 'http://www.example.com/', 'http://www.example.com', ) { my $uri = URI->new_abs('/', $_); print("$uri\n"); }` [download] `http://www.example.com/ http://www.example.com/ http://www.example.com/ http://www.example.com/` [download]	[reply] [d/l] [select]
Re^2: Regex Question by Anonymous Monk on Jan 04, 2009 at 06:33 UTC
Ahh... but I can't use the URI module as it's not installed and I can't get it installed.	[reply]
Re^3: Regex Question by Corion (Patriarch) on Jan 04, 2009 at 09:33 UTC
Also see Yes, even you can use CPAN, or copy the code from URI into your script.	[reply]
Re^3: Regex Question by ikegami (Patriarch) on Jan 04, 2009 at 06:35 UTC
~~Then you have a broken Perl. It comes with Perl~~ (No, only with ActivePerl) What problems are you having while trying to install it?	[reply]
Re^4: Regex Question by Anonymous Monk on Jan 04, 2009 at 06:42 UTC
Re^5: Regex Question by ikegami (Patriarch) on Jan 04, 2009 at 06:44 UTC
Re^4: Regex Question by Anonymous Monk on Jan 04, 2009 at 06:43 UTC
Re^5: Regex Question by ikegami (Patriarch) on Jan 04, 2009 at 06:49 UTC
Some notes below your chosen depth have not been shown here
Re: Regex Question by GrandFather (Saint) on Jan 04, 2009 at 07:04 UTC
Using URI is most likely more reliable than rolling your own regex, but in the interests of improving your knowledge of regexen the following may be of help: `use strict; use warnings; for (qw(http://www.url.com http://www.url.com/ http://www.url.com/cgi. +pl?x=y)) { my $displayed_link = $_; $displayed_link =~ s!^.?(\w://[^/]+/?).*!$1!; print $displayed_link, "\n"; }` [download] prints: `http://www.url.com http://www.url.com/ http://www.url.com/` [download] Perl's payment curve coincides with its learning curve.	[reply] [d/l] [select]
Re: Regex Question by BrowserUk (Patriarch) on Jan 04, 2009 at 07:01 UTC
Like this? `@urls = qw[ http://www.url.com http://www.url.com/ http://www.url.com/cgi.pl?x=y ];; m[^( http:// [^/]+ (?: / \| $ ) )]x and print "'$1'" for @urls;; 'http://www.url.com' 'http://www.url.com/' 'http://www.url.com/'` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: Regex Question by Anonymous Monk on Jan 04, 2009 at 06:24 UTC
use URI, http://search.cpan.org/~gaas/URI-1.37/URI.pm#PARSING_URIs_WITH_REGEXP	[reply]
Re: Regex Question by Anonymous Monk on Jan 07, 2009 at 16:42 UTC
`1) The parentheses in the pattern control what is output 2) You haven't done anything to make the trailing slash optional` [download] I leave the rest of the homework assignment up to you.	[reply] [d/l]