comment on

I have what's probably a simple question for my fellow monks, but Regular Expressions is one of my weaknesses, I am just unable to wrap my head around anything more than only the basics.

my $var =~ m/\w/i;
[download]

Thus poses my problem. I need a rather complicated regex, I need to be able to extract a domain name from a string which could be anything from a full url:

http://www.perlmonks.org/?node=Seekers%20of%20Perl%20Wisdom
[download]

or an email address, or even a bare string:

www.sub.sub2.domain.com
domain.com
ftp.domain.co.uk
adsl-44-33-22-11.dsl.bcvloh.sbcglobal.net
[download]

and I actually would like it to return two results, provided the entered string was more than just domain.com. Using the last line as my example I would need the 2 results to be:

gcvloh.sbcglobal.net
sgcglobal.net
[download]

also, I'd need to make sure that if an international domain name or URL were given, it checked for it and returned:

some.domain.com.au
domain.com.au
[download]

Again provided the string was more than just domain.com.br and if only the bare minimum was entered:

domain.com
domain.co.uk
domain.fm
domain.name
..etc, etc..
[download]

Now I've searched and read a couple of nodes here, that are very similar to this question, but aren't quite enough for me to work with to achieve my goal. One splits up a domain name domain.com to extract domain and the other only focuses on http:// URLs only, and I've Searched Google and the results I've found again don't quite give enough for me to work with, as I am rather dense when it comes to regex.

Many Thanks Fellow Monks,

jnbek

=== Update ===

Looks like actually I have been looking at this from the wrong angle. I have managed to make myself feel like the silly n00b that I am. I only need a regex to strip off extra characters from the front and back, basically between the /'s. Strip off http://|ftp:// etc, then strip the right end / or ? or # then use the pop() function a couple times with a join to get the domain name. So, be it sub1.sub2.sub3.foo.bar.www.domain.com or domain.com I get domain.com to work with. I've only got initial test code with the pop() usage:

 my $d = "spam.yomama.www.zoelife4u.org";
 my @domain = split(/\./, $d);

 my $tld = pop(@domain); #org
 my $baredomain = pop(@domain); #zoelife4u

 my @result = ( $baredomain, $tld );
 $maindomain = join("\.", @result);
 
 print "End: $maindomain\n;"
[download]

And I think I've found a useful regex to work with here. Based on this, anyone have any critique?

In reply to Regex for extracting a domain name from a string. by jnbek

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.