eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

I want to strip trailing stuff from a host name. To clarify, running this test program t1.pl:

use strict; use warnings; use Test::More; my @expected = ( [ 'abc', 'abc' ], [ 'abc.bill.com', 'abc' ], [ 'abc.bill.com.au', 'abc' ], [ 'xy42.com', 'xy42' ], [ 'x_y.com', 'x_y' ], [ 'x-y.com', 'x-y' ], [ '', '' ], [ '.', '' ], [ 'a.', 'a' ], [ '-.', '-' ], [ '_.', '_' ], [ '.a', '' ], [ 'f', 'f' ], [ 'f.1', 'f' ], [ 'f.1.2', 'f' ], [ 'f.1.2.3', 'f' ], [ 'f.1.2.3.4', 'f' ], [ 'f.1.2.3.4.5', 'f' ], [ 'f.1.2.3.4.5.67', 'f' ], [ 'ABC.123.456', 'ABC' ], ); plan tests => scalar(@expected); for my $e (@expected) { my ( $got, $exp ) = @{$e}; $got =~ s/\..*$//; is( $got, $exp, "'$e->[0]'" . ' -> ' . "'$got'" ); }

produces:

1..20 ok 1 - 'abc' -> 'abc' ok 2 - 'abc.bill.com' -> 'abc' ok 3 - 'abc.bill.com.au' -> 'abc' ok 4 - 'xy42.com' -> 'xy42' ok 5 - 'x_y.com' -> 'x_y' ok 6 - 'x-y.com' -> 'x-y' ok 7 - '' -> '' ok 8 - '.' -> '' ok 9 - 'a.' -> 'a' ok 10 - '-.' -> '-' ok 11 - '_.' -> '_' ok 12 - '.a' -> '' ok 13 - 'f' -> 'f' ok 14 - 'f.1' -> 'f' ok 15 - 'f.1.2' -> 'f' ok 16 - 'f.1.2.3' -> 'f' ok 17 - 'f.1.2.3.4' -> 'f' ok 18 - 'f.1.2.3.4.5' -> 'f' ok 19 - 'f.1.2.3.4.5.67' -> 'f' ok 20 - 'ABC.123.456' -> 'ABC'

I'm pretty sure I can assume my input is just an alphanumeric host name, for example fred42 or fred.com but not 192.0.2.16 say. I further doubt I need to deal with ports :80 or ?query or other guff. Though the above crude hack will probably be adequate for my needs, I'm interested to learn how other folks might tackle this sort of problem.

Replies are listed 'Best First'.
Re: Extracting a bald host name
by haukex (Archbishop) on Oct 22, 2018 at 08:01 UTC

    A very quick skim of both RFC1034 and RFC1035 only gives the following grammar as a "should" - if there are more rules in the text, I haven't found them yet. RFC2181 ("Clarifications to the DNS Specification") does say "Occasionally it is assumed that the Domain Name System serves only the purpose of mapping Internet host names to data, and mapping Internet addresses to host names. This is not correct, the DNS is a general (if somewhat limited) hierarchical database, and can store almost any kind of data, for almost any purpose."

    <domain> ::= <subdomain> | " " <subdomain> ::= <label> | <subdomain> "." <label> <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> <let-dig-hyp> ::= <let-dig> | "-" <let-dig> ::= <letter> | <digit> <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case <digit> ::= any one of the ten digits 0 through 9

    So splitting on dots seems to be fairly safe. If you wanted to go full CPAN on this:

    use Net::DNS::DomainName; my $host = "foo.example.com"; my $bare = (Net::DNS::DomainName->new($host)->label)[0]; print "$bare\n"; # prints "foo"

    Although according to Net::DNS::DomainName, several of the examples in your tests are not valid domain names.

    For IP addresses, you might use Regexp::Common::net to identify those, and as for port numbers and query strings, I might use URI (if you are dealing with URLs, at least).

Re: Extracting a bald host name
by choroba (Cardinal) on Oct 22, 2018 at 07:45 UTC
    I'd probably use a non-destructive match instead of substitution:
    my ($result) = $got =~ /^[^.]*/g;
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Extracting a bald host name
by harangzsolt33 (Deacon) on Oct 22, 2018 at 07:36 UTC
    It looks like you are trying to extract a part of a string that comes before the period.
    use strict; use warnings; print StrBefore('abc.com', '.'); # # This function splits a string into two parts # at the first occurence of SUBSTR. # Returns the first half of the string. # # Usage: STRING = StrBefore(STRING, SUBSTR) # # Example: StrBefore("Abcdef", "cd") --> "Ab" # StrBefore("abc.us", ".") --> "abc" # StrBefore("tree", ".") --> "tree" # sub StrBefore { @_ or return ''; my $S = shift; defined $S or return ''; length($S) or return ''; @_ or return $S; my $B = shift; defined $B or return $S; length($B) or return $S; my $P = index($S, $B); return ($P < 0) ? $S : substr($S, 0, $P); }