rementis has asked for the wisdom of the Perl Monks concerning the following question:

Hello again to all you wise and powerful monks!

I have what I think is going to be a simple question...

I have a subroutine that uses Socket, gethostbyaddr to resolve the hostname of an ip address. This subroutine needs to run about 35,000 times and unfortunately many of the times the hostname will not be found and the sub will timeout. This does not cause any problems except that it takes quite a few seconds to timeout, making the script run way longer than necessary. How can I make gethostbyaddr timeout sooner, say in 2 seconds? I found one solution in the Perl Cookbook using eval, alarm, and a SIGALRM handler, but I was hoping to see a better way to get this done. I am not married to gethostbyaddr by the way. Thanks in advance!

Replies are listed 'Best First'.
Re: Timout a gethostbyaddr?
by shigetsu (Hermit) on Mar 22, 2007 at 23:15 UTC
    You probably want Time::Out:
    use Time::Out; my $retval = timeout 2 => affects { # some statement }; print $retval;
      Thanks for this one, I probably should have searched CPAN before bothering everyone. :)

      The docs for Time::Out suggest that blocking IO under windows does not work however, so I will have to test and see.

        Run for your life, Time::Out uses alarm
Re: Timout a gethostbyaddr?
by Zaxo (Archbishop) on Mar 23, 2007 at 03:40 UTC

    Forking lookups should help, since processes awaiting the system effectively sleep. Parallel::ForkManager would be a big help for that.

    If you're on a platform which prefers threads, I can't help much, but the fork emulation might be fine for this.

    After Compline,
    Zaxo

Re: Timout a gethostbyaddr?
by ikegami (Patriarch) on Mar 22, 2007 at 23:15 UTC

    I found one solution in the Perl Cookbook using eval, alarm, and a SIGALRM handler, but I was hoping to see a better way to get this done

    What's wrong with alarm?

    This subroutine needs to run about 35,000 times

    Sounds like some parallelism would help here. Have you considered doing multiple requests asynchronously [using Net::DNS (see the select example) or threads] to do multiple requests at the same time?

Re: Timout a gethostbyaddr?
by shmem (Chancellor) on Mar 22, 2007 at 23:28 UTC
    AFAIK nobody is married to gethostbyaddr anyways ;-)

    But your 35.000 IP addresses will certainly resemble a tree structure - that of the DNS itself. Cache your DNS lookup responsivenesses of PTR records from the highest byte to the lowest, that will result in some gain; timeouts depend on what your OS's resolver dishes out, and you can override those with alarm calls.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Timout a gethostbyaddr?
by salva (Canon) on Mar 23, 2007 at 08:22 UTC
    On most Unix systems the resolver timeout is configurable in /etc/resolv.conf. For instance, on Linux, man resolv.conf says:
    options
      Options allows certain internal resolver variables to
      be  modified. The syntax is
    
               options option ...
    
      where option is one of the following:
    
      ...
    
        timeout:n
          sets  the amount of time the resolver will wait for a
          response from a remote name server before retrying the
          query via a different name server.  Measured in seconds,
          the default is RES_TIMEOUT (currently 5, see <resolv.h>).
    
    The drawback is that it is a system wide setting that (AFAIK) can not be changed for specific applications.

    update: hey!, actually it can be changed in a per process manner...

      The  options keyword of a system's resolv.conf file
      can be amended on a per-process basis by setting the
      environment  variable "RES_OPTIONS" to  a space-separated
      list of resolver options as explained above under
      options.
    
    So, probably doing...
    $ENV{RES_OPTIONS} = "timeout:$timeout";
    before calling gethostbyname will do what you want!
      $ENV{RES_OPTIONS} = "timeout:$timeout";

      It might be worth noting that what you specify here will probably not be the overall timeout, because in a typical scenario the resolver is configured to query multiple servers, with multiple retries per server. The timeout applies to every single query performed. You should be able to control the number of retries by saying something like $ENV{RES_OPTIONS} = "timeout:2 attempts:1";.

      However, when I was just playing with this (on Linux, btw), it didn't quite behave as expected: the attempts setting did not influence the number of retries (as stated in the manpage), but apparently limited the number of nameservers queried. There were still 4 retries/queries to the same nameserver... (confirmed by strace-ing a sample (failing) lookup).

Re: Timout a gethostbyaddr?
by almut (Canon) on Mar 23, 2007 at 14:06 UTC

    Another option would be to use Net::DNS::Resolver - at least if you're happy with DNS lookups only (AFAICT, it doesn't do local lookups via /etc/hosts). The Net::DNS suite of modules allows very flexible configuration. Among other things you can configure timeouts, list of nameservers to query and number of retries per server (those should be the most interesting options to reduce lookup times...).

    Here's a simple example, which you can use for both forward and reverse lookups:

    use Net::DNS::Resolver; my $res = Net::DNS::Resolver->new( nameservers => [qw(10.0.5.10)], # specify your own here udp_timeout => 2, retry => 1, #debug => 1, ); my $host = '123.123.123.123'; if (my $pkt = $res->query($host)) { for my $answer ( $pkt->answer() ) { my $type = $answer->type(); if ($type eq "PTR") { print $answer->ptrdname(), "\n"; } elsif ($type eq "A") { print $answer->name(), "\n"; } } }