Re: Efficient DNS lookup

I'm going to make the assumption that you are doing something like reading a log file or something similar which involves a *LOT* of IP to Name resolutions.

If this is the case, then you can gain a large speed increase by running multiple name resolutions in parallel using Net::DNS. Most of the delay in a DNS lookup is actually related to the time to go out to the net and get the data. There is no reason why you can't just go on about your work while waiting.

In addition, using a resolution cache of some sort, where you don't do the same lookup more than once is helpful.

If I was implementing this, I would do the following:

Write a function called something like "startresolve" which you passed $IP. This function would look something like:

WARNING: Untested Semi-pseudocode follows

sub startresolve
{
  my $iptolookup=shift;
  if (!$ipcache{$iptolookup})
  {
    $ipcache{$iptolookup}="UNRESOLVED";
    push @tolookup,$iptolookup;
  }
}

I would also create a function which basically handled the %ipcache hash and @tolookup array. I would basically steal the main program loop from the mresolv sample code (search on cpan). However, it would only get run once per function call (I.E. the function would basically be just the part of the code in while(1)...). This loop basicaly keeps $opt_n resolutions going at once. I'd replace $eof with a check to see if there were any items in @tolookup, fix it to get the $name out of @tolookup, and insert any answers back in %ipcache.

Now, what you do is call this function once per program loop (or more often as necessary). This creates basically another "thread" which handles the name cache.

So to get back to where we were.. You have your main program loop calling startresolve($IP) whenever you have an IP in hand you know you will need to get a name for, and the "maintenance" function whenever it is convienient (not too often, but not too infrequently - you want to make sure you keep the resolver full, but not checking on it every half microsecond). Remember that you aren't going to have the name for the $IP at this point, but it will be available later (think background process).

When you are done grabbing IP's which need to be resolved, you should call the "maintenance" function until there is no additional dns queries in the queue ($sel->count is zero). Probably just a "while ($sel->count) { domaintenance; };" type of line. This will finish up the resolutions.

When you are at this point, you should have a hash of all the IP's you need in %ipcache. Which you can then use as you spit out the data you collected earlier.

This general technique can be used in other ways. For instance, I (prior to the existance of spamassassin) was needing to look up an IP on several anti-spam blackhole lists. I just used the technicques mentioned to start a lookup for ALL of the rbl lists in parallel, and then waited for the responses to come back (or for a timeout to elapse).

One caveat I see in the example code I mentioned is that it appears it MIGHT wait for all of the queries to come back before proceeding. Generally you don't want to do this, but instead you want to just get the ones which have come back and let the others wait. ALSO, you may want to check for the responses first, then send the new queries (flop the two). That way, you will have more slots available for a query and it should go faster.

Again, let me mention I haven't tested the code above. I intend this as a starting point to get you headed in the right direction.

Comment on Re: Efficient DNS lookup

Replies are listed 'Best First'.
Re: Re: Efficient DNS lookup by fourmi (Scribe) on Aug 19, 2003 at 09:59 UTC
Okay guys thanks a lot, I know have a few things to play with to get it all ticking over a bit quicker. Will post results! cheers ant	[reply]