in reply to Extracting a bald host name
A very quick skim of both RFC1034 and RFC1035 only gives the following grammar as a "should" - if there are more rules in the text, I haven't found them yet. RFC2181 ("Clarifications to the DNS Specification") does say "Occasionally it is assumed that the Domain Name System serves only the purpose of mapping Internet host names to data, and mapping Internet addresses to host names. This is not correct, the DNS is a general (if somewhat limited) hierarchical database, and can store almost any kind of data, for almost any purpose."
<domain> ::= <subdomain> | " " <subdomain> ::= <label> | <subdomain> "." <label> <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> <let-dig-hyp> ::= <let-dig> | "-" <let-dig> ::= <letter> | <digit> <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case <digit> ::= any one of the ten digits 0 through 9
So splitting on dots seems to be fairly safe. If you wanted to go full CPAN on this:
use Net::DNS::DomainName; my $host = "foo.example.com"; my $bare = (Net::DNS::DomainName->new($host)->label)[0]; print "$bare\n"; # prints "foo"
Although according to Net::DNS::DomainName, several of the examples in your tests are not valid domain names.
For IP addresses, you might use Regexp::Common::net to identify those, and as for port numbers and query strings, I might use URI (if you are dealing with URLs, at least).
|
|---|