in reply to Regex for hostname validation

Seconding the other reply this feels like (almost) an X/Y problem. At first skim over title and first sentence my off the cuff instinct was "Just try and resolve the name to an IP with Net::DNS, or gethostbyname and let your libc handle things" and be done with it. I could see several cases where some of your constraints while "RFC legal" wouldn't apply still (e.g. I've used a custom non-standard internal TLD for local names that's valid in the context I used it; (internal) DNS certainly would have resolved it but it would have failed bullet 2).

The cake is a lie.
The cake is a lie.
The cake is a lie.

Replies are listed 'Best First'.
Re^2: Regex for hostname validation
by hrcerq (Monk) on May 03, 2025 at 01:05 UTC

    You're right, I've not been very specific here. In my defense, that was a bit on purpose, because I think we miss some opportunities when we rush to a solution that just works instead of paying attention to why some approach is not good enough.

    But again, I recognize that giving less details than necessary created some doubts. So let me explain what I need here. I'm managing a hosts file that must be periodically updated to filter many domain/hostnames known to be used on ads, trackers, annoyances and malware (by associating them with 0.0.0.0, which somewhat protects the configured machine from accidentally requesting anything from them).

    As you might guess, this file gets very big and I'd like to filter out any record that's invalid anyway, so that it's pointless to add it to the hosts file. Sure, keeping only the legal addresses is not good enough for this purpose, but I intended to add warnings on output for those that are not legal.

    Your suggestion to leverage libc to resolve it is nice, because then I know if it'll be resolved or not, which for this purpose is very important. On the other hand, considering these are hosts known to serve bad things, I'd rather avoid the queries, even if I'm not reaching these machines themselves.

    return on_success() or die;