in reply to Regex for hostname validation

hello hrcerq,

sorry not to be the right guy to review your regex, but I'd suggest a totally different approach.

As you already extracted rules from the fun RFC plethora and, as you said, there are many complications and dark corners I'd prefere a more verbose but easier to implement and to expand dedicated subroutine or, in ideal world, a dedicated module to do this: and yes perl's ecosystem is nearly an ideal world and we have Data::Validate::Domain but as I read in your homenode: I like to take the most out of core Perl before resorting to external modules. let assume you wont your own solution.. but look carefully at Data-Validate-Domain.t and also to Regexp-Common's plethora of RFC utilities for URIs...

To be precise you must be more explicit on what you wont to validate: hostnames to browse or hostnames for a DNS entry? Infact you mention internatiolised hostname but for first ones you have Unicode and for the latter ACE-strings

..but going on your own, and assuming you are not looking for DNS entries, I'd go with something like

sub validate_hostname{ my $candidate = shift; my ($ascii_only, $verbose, $debug) = @_; # leave room for improvme +nts and flexibility my ($return, $descr); # non ASCII if ( $candidate =~ m/[^[:ascii:]]/ ) { # but see: https://perlmo +nks.org/?node_id=11164574 print "Not ACSII\n" if $verbose; # ..accepted? if ($ascii_only){ return wantarray ? (undef,"non ASCII string [$candidate] r +ejected") : undef; } # go with another specialized sub.. validate_hostname_Unicode($candidate,$ascii_only, $verbose, $d +ebug ); } # ASCII # too long.. if (length $candidate >= 255){ $descr = "[$candidate] is too long (".length $candidate." cha +rs)"; print $descr if $verbose; return wantarray ? (undef,$descr) : undef; } # Hostnames might be composed by 1 or more labels (separated by do +ts) unless ($candidate =~ /\./){ $descr = "[$candidate] contains no dots"; print $descr if $verbose; return wantarray ? (undef,$descr) : undef; } # .. more checks for this rule # Each label may have at most 63 characteres ... #..have fun :) }

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: Regex for hostname validation
by hrcerq (Monk) on May 03, 2025 at 01:03 UTC

    Thanks for your thoughtful reply.

    ... but look carefully at Data-Validate-Domain.t

    I will. I'm not against using external modules, it's just that I like to keep things minimal, so unless I'd reimplement the entire module and it's not trivial, I'd rather "reinvent the wheel", as core Perl is already a very nice toolbox and we can accomplish a lot with it.

    And yes, looking at the module tests will help a lot, thank you for the suggestion.

    ... hostnames to browse or hostnames for a DNS entry? Infact you mention internatiolised hostname ...

    I mistakenly mentioned IDNs just to make a point on RFCs relationships and the complexities involved, but I don't really have to cope with Unicode here, because I'm dealing with hosts file entries, so even if they used IDNs, they'd be already puny'encoded.

    Yet your suggestion was valuable, because if I wanted to validate names before they're encoded, I might do something like that. Also reminded me that I might (and should) use /a option here.

    I just wonder which approach would have lesser impact on performance, considering I might have to validate many names at once. Guess I'll have to test it.

    return on_success() or die;