johnnywang has asked for the wisdom of the Perl Monks concerning the following question:

Given a string, I'd like to check syntactically whether it can be valid domain name or IP. I've found this post by merlyn in 2000: Test the syntactic validity of a domain name, it seems to be a little too strict, as pointed by others (e.g, misses 411.com). Is there a more "correct" solution, both in terms of speed and what it does. Currently I'm using the following code, which also includes some test cases, are there other cases?
use strict; use Test::More 'no_plan'; my @good=("www.foo.com","www.411.com","123.34.3.5","web.foo.info", "www.foo.co.uk","aaaa-4-bbbb.cCc.COM"); my @bad = ("foo.23.4.2","foo","23","234.12.4.5.4","2345.23.4.43"); ok(is_valid($_)) foreach @good; ok(!is_valid($_)) foreach @bad; sub is_valid{ my $name = shift; my $name = lc $name; return ($name && $name =~ /^[-\.\w]+\.\w+$/ #at least two levels && ($name !~ /\.\d+$/ || $name =~ /^\d{1,3}\.\d{1,3}\.\d{1 +,3}\.\d{1,3}$/) #if last part is digit, better be ip ); }

Replies are listed 'Best First'.
Re: Syntactically check domain name/IP
by tachyon (Chancellor) on Oct 01, 2004 at 01:00 UTC

    This should do a pretty fair job. You might want to return $_ for true depending on the application.

    { my ( $TLD, $ccTLD ); sub valid_domain { local $_ = shift; # user:pass@401authen.com is ~ valid and useful for more than +phishing. if ( m/^[^@]+@/ ) { (undef, $_) = split '@'; } # domain.com:8080 is ~ valid/common if ( m/:\d+\z/ ) { ($_) = split ':'; } # having dealt with edge cases check for the valid chars return 0 unless m/^[A-Za-z0-9\.\-]+\z/; # domains are case insensitive and we need lc for hash table l +ookup $_ = lc; # end in digits can only be dot quad if ( m/\.\d+$/ ) { return 0 unless m/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1, +3})$/; return 1 if $1 < 256 and $2 < 256 and $3 < 256 and $4 < 25 +6; } # end in alpha chars and appropriate \w\.\w{2,}$ type syntax elsif ( m/\.([A-Za-z]{2,})$/ ) { ( $TLD, $ccTLD ) = init_tld() unless $TLD; return 0 if m/^[.-]/ or m/\.{2}/ or m/\-\.|\.\-/; return 1 if exists $TLD->{$1} or exists $ccTLD->{$1}; } # everything else is invalid return 0 } } sub init_tld { my ( %TLD, %ccTLD ); @TLD{ qw( aero arpa biz com coop edu gov info int mil museum name nato net org pro ) } = (); @ccTLD{qw( ac ad ae af ag ai al am an ao aq ar as at au aw az ba bb bd be bf bg bh bi bj bm bn bo br bs bt bv bw by bz ca cc cd cf cg ch ci ck cl cm cn co cr cu cv cx cy cz de dj dk dm do dz ec ee eg eh er es et fi fj fk fm fo fr fx ga gd ge gf gg gh gi gl gm gn gp gq gr gs gt gu gw gy hk hm hn hr ht hu id ie il im in io iq ir is it je jm jo jp ke kg kh ki km kn kp kr kw ky kz la lb lc li lk lr ls lt lu lv ly ma mc md mg mh mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nc ne nf ng ni nl no np nr nu nz om pa pe pf pg ph pk pl pm pn pr ps pt pw py qa re ro ru rw sa sb sc sd se sg sh si sj sk sl sm sn so sr st sv sy sz tc td tf tg th tj tk tm tn to tp tr tt tv tw tz ua ug uk um us uy uz va vc ve vg vi vn vu wf ws ye yt yu za zm zw ) } = (); return \%TLD, \%ccTLD; }

    cheers

    tachyon

      # having dealt with edge cases check for the valid chars return 0 unless m/^[A-Za-z0-9\.\-]+\z/;

      Depending on the reason for testing the input, you may want to allow underscores in there so that things like dear_raed.blogspot.com get through — that's technically invalid but is in use; it works for me at home but not in the office.

      Smylers

        You just gotta love standards on the Internet ;-) I didn't know that _ was supported. s/A-Za-z0-9/\w/ Thanks for the heads up.

        cheers

        tachyon

Re: Syntactically check domain name/IP
by zeimusu (Sexton) on Oct 01, 2004 at 00:09 UTC

    No code offered, but the relevant rfc is Domain names - implementation and specification (rfc1035).

    Now, this disallows 411.com, as each word in a domain name must start with a letter (not a digit). So what is valid in practice is clearly at variance with the spec.

    Also this is a specification in flux at the moment with the introduction of internationalized domain names. (already available on the .jp tld).

Re: Syntactically check domain name/IP
by NetWallah (Canon) on Oct 01, 2004 at 00:03 UTC
    You will need to watch out for :
    • Reverse domain names, such as 1.4.220.134.in-addr.arpa
    • Invalid IP's such as 123.456.78.90

        Earth first! (We'll rob the other planets later)

Re: Syntactically check domain name/IP
by Anonymous Monk on Oct 01, 2004 at 10:09 UTC
    Conways module Regex::Common has a regex for IP addresses.
Re: Syntactically check domain name/IP
by Beechbone (Friar) on Oct 01, 2004 at 14:53 UTC
    what about www.müller.de? A completely valid and working domain... (although the content is rubbish)

    Just change my $name = lc $name; to $name = eval { punycode(lc $name) }; and add the Punycode code to your program. That'll do the trick.


    Search, Ask, Know