Extracting domain names from FQDNs

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I am providing way too much information here in the hopes that someone may see what I'm attempting to do and offer a solution that will address the entire problem as opposed to the specific "domain name" problem that I'm addressing.

Here's the basic scenario: we have a Web site that allows dealers to sign up with manufacturers and have an entire e-commerce site created with just the upload of a spreadsheet that the dealer fills out. However, the dealer still needs a domain. After the domain is entered, the manufacturer is supposed to be able to enter the URL into one of our Web pages and the following should occur:

The Technical contact should appear on the page.

If not technical contact, default to the administrative contact.

Default: dump all information to the page.

So far, everything works perfectly, unless the dealer has a TLD registered in a foreign country. Net::Whois is terribly out of date and will not return information for maserith.com. btrott recommended Net::ParseWhois and that's what I'm using, but it will not return information for lexicon.co.uk. It was also recommended that I try Net::XWhois, but that also fails to retrieve information for lexicon.co.uk.

The only thing which has been successful for foreign top level domains has been to use Net::Whois::Raw. This information is dumped to me in a raw format which is ugly, but I can still send to the page. You can use the following to test the usage:

perl -MNet::Whois::Raw -e "print whois(\"lexicon.co.uk\")"
[download]

The problem, however, is that all versions of whois expect that domain name without the host. The following snippet generally gets this info:

#!C:\perl\bin\perl.exe -w
use strict;
use Socket;
my $arg = shift @ARGV or die "Need a domain, dummy!\n";
my ( $domain ) = ( $arg =~ m!^(?:[^/]+/?/)?([^/]+)! );

print "Domain is " . get_domain( $domain );

sub get_domain {
    my @segment = reverse split /\./, shift;
    my $domain;
    return 0 if $segment[0] =~ /^(?:local|public)$/;
    SEARCH_FOR_DOMAIN: {
        foreach ( @segment ) {
            ( $domain = $_, next ) if not $domain;
            $domain = $_ . ".$domain";
            last SEARCH_FOR_DOMAIN if inet_aton( $domain );
        }
        return 0;
    }
    return $domain;
}
[download]

If it's named domain.pl, I can use the following:

domain.pl http://www.perlmonks.org/
domain.pl http://perlmonks.org/
domain.pl www.perlmonks.org
etc.
[download]

Any of the above will print "Domain is perlmonks.org". Unfortunately, it will print "Domain is www.lexicon.co.uk" if I enter www.lexicon.co.uk. In this case, the domain is actually lexicon.co.uk. Why is this a problem? Because the aforementioned whois queries will fail if I supply a host/domain. How do I get just the domain if someone supplies me with a fully qualified domain name (FQDN)? I've tried Net::DNS but it is dog-slow and fails on my system. mdillon provided an interesting example of how to do this, but $q is undefined after the following line:

my $q = $res->send($domain, "SOA");
[download]

Even if I could find out why it's doing that, Net::DNS is too slow for my needs. Anyone have any ideas on how to get that TLD quickly? We're trying to automate as much as possible, so having to figure this out by hand is the last option.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on Extracting domain names from FQDNs Select or Download Code

Replies are listed 'Best First'.
Re: Extracting domain names from FQDNs by mwp (Hermit) on Dec 02, 2000 at 01:17 UTC
Compile a hash/array of common TLDs (there are several good lists out there on the web) Strip off TLD from domain (ie, .com, .co.uk, .la) `sub.domain.tld => sub.domain` Last dot-token must be domain `sub.domain => domain` Re-append TLD `domain => domain.tld` I understand processing speed is a factor, but with a little optimization... =) Good luck. 'kaboo Update: Here's a start. (I was bored.) `sub parse_fqdn { my $fqdn = shift; my @tlds = ( [qw(com net org)], # common [qw(gov edu co\.uk)], # not-so-common [qw(la li it po)] # downright odd (long list) ); foreach my $level (@tlds) { #warn "pass ". ++$pass; for(@{$level}) { return $1 if $fqdn =~ /\.(\w+?\.$_)$/; } } warn "unable to find domain from $fqdn\n"; return $fqdn; } my $domain = parse_fqdn('www.cs.niu.edu'); print "www.cs.niu.edu => $domain\n"; # niu.edu` [download]	[reply] [d/l] [select]
Re: Extracting domain names from FQDNs by chipmunk (Parson) on Dec 02, 2000 at 01:04 UTC
Would it be possible to start with the last two sections of the domain, and then add one section at a time until Whois gives a result? For example, try 'co.uk', which wouldn't work, then 'lexicon.co.uk', which would. Something like this: `my $host = 'www.lexicon.co.uk'; my @parts = split /\./, $host; my $domain = pop @parts; my $whois = ''; while (@parts and not $whois) { $domain = pop(@parts) . ".$domain"; $whois = Whois($domain); } print $whois;` [download] With adjustments for the specific Whois module being used, of course.	[reply] [d/l]
(Ovid) Re(2): Extracting domain names from FQDNs by Ovid (Cardinal) on Dec 02, 2000 at 01:07 UTC
I've looked into that, but response time is a factor here. Whois is slow enough for the international domains that incrementally testing something like `www.cs.flinders.edu.au` becomes an issue. Thanks for the comment, though! Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply]
Re: (Ovid) Re(2): Extracting domain names from FQDNs by Blue (Hermit) on Dec 02, 2000 at 01:46 UTC
Ovid, If speed is an issue, why not fork off some children, and search for all variations at once? Assume you the minimal is a domain.TLD, so www.cs.flinder.edu.au would give you: `www.cs.flinder.edu.au cs.flinder.edu.au flinder.edu.au edu.au` [download] You can further optimize this by taking a listing of non-foreign TLDs (.com, .net, .edu, .mil, .org, etc) and only running this if your domain does not end in one of these. Not a real solution, but perhaps a workaround. =Blue ...you might be eaten by a grue...	[reply] [d/l]
Re: Extracting domain names from FQDNs by jepri (Parson) on Dec 02, 2000 at 08:06 UTC
Ovid, part of your problem appears to be in dealing with the new distributed whois system. There is a script which contains a list of definitions for default servers, this may be of assistance in getting your modules to correctly lookup foreign domain names. Here is another script with an even larger list of definitions. This one appears to be queen of the lists right now. I didn't realise but it appears each country now has it's own whois database and you have to query the right one. Naturally there are no easy naming conventions. I have found only one script which does the job properly... by contacting the root whois server, getting a reference to the correct server for the country, then querying that one: when in doubt, plunder a GPL project. ____________________ Jeremy I didn't believe in evil until I dated it.	[reply]