BlueLines has asked for the wisdom of the Perl Monks concerning the following question:

RFC 1912 defines the standard for naming a machine on a network. Here's a quote:

DNS domain names consist of "labels" separated by single dots. Allowable characters in a label for a host name are only ASCII letters, digits, and the `-' character. Labels may not be all numbers, but may have a leading digit (e.g., 3com.com). Labels must end and begin only with a letter or digit.

This seems simple enough to write as one massive regex, but i've had little to no luck making _everything_ work. So here's what I'm using (after testing for bad characters with tr/a-zA-Z\-\./):
$error=1 if ($hostname =~/\.$/); #trailing . is bad my @labels = split(/\./, $good_string); foreach my $foo (@labels) { $error=1 if ($foo=~/^\-/); #can't start with a - $error=1 if ($foo=~/\-$/); #can't end with a - $error=1 if ($foo=~/^\d+$/); #can't be only numeric last if $error; } if ($error) { print "A hostname!\n"; } else { print "Not a hostname!\n"; }
This works, but doesn't look very JAPH-esque. I started to write a regex, but after it got longer than 1 line i gave up (faster development always beats l33ter code). Here's what i started on:
$hostname =~/^[^-]([a-zA-z\-])+[^-](\.[^-][a-zA-Z0-9\-]+?[^-])+?/;
Ugh. The not matching of the "-" at the beginning of a line counts as one match (as does the end), so this failed when the label was shorter than 3 characters. Anyone have an idea on how to implement this in one line?

Bonus Question:

More from the RFC:

You should also be careful to not have addresses which are valid alternate syntaxes to the inet_ntoa() library call. For example 0xe is a valid name, but if you were to type "telnet 0xe", it would try to connect to IP address 0.0.0.14. It is also rumored that there exists some broken inet_ntoa() routines that treat an address like x400 as an IP address.


Any ideas? Perhaps evaling an inet_ntoa call on each label?

BlueLines

Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.

Replies are listed 'Best First'.
(tye)Re: Matching RFC1912 compliant hostnames
by tye (Sage) on Nov 18, 2000 at 06:44 UTC
Re: Matching RFC1912 compliant hostnames
by fundflow (Chaplain) on Nov 18, 2000 at 05:03 UTC
    (Ignore this... go straight to the pointers by tye)

    Here's something quick, until someone comes with a better one:

    perl -ne 'print "YES\n" if /^[a-z0-9][a-z0-9.-]*[a-z0-9]$/ && /[a-z]/ +&& ! /\.\./'
    (This doesn't accept 1-letter names, which is probably okay)

    update: the original one accepted a..b, now it doesn't (and is less elegant :)

      It might be easier (well, more readable anyway) to break it up into components:
      my $alpha = 'a-zA-Z'; my $alphanum = $alpha.'\d'; my $any = $alphanum.'-'; my $label = qr/ (?:[$alpha]| # start with a letter \d[$any]*[$alpha]+) # or a number if there's a letter + elsewhere (?: [$any]* # followed by more stuff maybe [$alphanum] # as long as it doesn't end with +a - )? /x; $is_valid = /^(?:$label\.)*$label$/;
      Note that while this matches 1912 hostnames, some newer hostnames have been belatedly declared "valid" though they do not adhere to this RFC (specifically, domains like 411.com or 800.com, which violate the "labels must not be all numbers" rule). Perhaps there is a newer RFC that supercedes 1912?

      Perhaps somebody has a better solution...

        "I was married to a regex, now its my ex-reg"