I have what's probably a simple question for my fellow monks, but Regular Expressions is one of my weaknesses, I am just unable to wrap my head around anything more than only the basics.

my $var =~ m/\w/i;

Thus poses my problem. I need a rather complicated regex, I need to be able to extract a domain name from a string which could be anything from a full url:

http://www.perlmonks.org/?node=Seekers%20of%20Perl%20Wisdom

or an email address, or even a bare string:

www.sub.sub2.domain.com domain.com ftp.domain.co.uk adsl-44-33-22-11.dsl.bcvloh.sbcglobal.net

and I actually would like it to return two results, provided the entered string was more than just domain.com. Using the last line as my example I would need the 2 results to be:

gcvloh.sbcglobal.net sgcglobal.net

also, I'd need to make sure that if an international domain name or URL were given, it checked for it and returned:

some.domain.com.au domain.com.au

Again provided the string was more than just domain.com.br and if only the bare minimum was entered:

domain.com domain.co.uk domain.fm domain.name ..etc, etc..

Now I've searched and read a couple of nodes here, that are very similar to this question, but aren't quite enough for me to work with to achieve my goal. One splits up a domain name domain.com to extract domain and the other only focuses on http:// URLs only, and I've Searched Google and the results I've found again don't quite give enough for me to work with, as I am rather dense when it comes to regex.

Many Thanks Fellow Monks,

jnbek

=== Update ===

Looks like actually I have been looking at this from the wrong angle. I have managed to make myself feel like the silly n00b that I am. I only need a regex to strip off extra characters from the front and back, basically between the /'s. Strip off http://|ftp:// etc, then strip the right end / or ? or # then use the pop() function a couple times with a join to get the domain name. So, be it sub1.sub2.sub3.foo.bar.www.domain.com or domain.com I get domain.com to work with. I've only got initial test code with the pop() usage:

my $d = "spam.yomama.www.zoelife4u.org"; my @domain = split(/\./, $d); my $tld = pop(@domain); #org my $baredomain = pop(@domain); #zoelife4u my @result = ( $baredomain, $tld ); $maindomain = join("\.", @result); print "End: $maindomain\n;"
And I think I've found a useful regex to work with here. Based on this, anyone have any critique?

In reply to Regex for extracting a domain name from a string. by jnbek

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.