costanza has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone,

I have a CSV file with a large list of domains and need to do DNS lookups against it. The input file is like this:

1,host1.foo.bar 2,host2.foo.bar n,hostn.foo.bar

I basically need to check if the domains are resolvable - with a 1 second timeout per DNS lookup and a list of 50,000 domains, this can take quite a while. I haven't used Perl threads before but think this would be a good candidate for it. Here is my code:

my $dns = new Net::DNS::Resolver; $dns->tcp_timeout( 1 ); $dns->udp_timeout( 1 ); my $my_file = $ARGV[0]; open(INFILE, $my_file) || die ("I refuse to open your file!"); my @my_data=<INFILE>; close(INFILE); foreach my $line(@my_data) { ## Parse the file chop($line); my ($number,$domain)=split(/,/,$line); testDns($number,$domain,$dns); } sub testDns { my ($line_number, $domain_to_test, $default_resolver) = @_; my $ns = $default_resolver->query( $domain_to_test, 'NS' ); if ($ns) { foreach my $rr ($ns->answer) { if ($rr->type eq "NS") { my @auth_nameservers = $rr->nsdname; foreach my $auth_nameserver (@auth_nameservers) { print "$line_number,$domain_to_test,$auth_nameserv +er\n"; } } } else { print "$line_number,$domain_to_test,error\n"; } }

How can I implement multiple threads in the above code?

Thanks, George

Replies are listed 'Best First'.
Re: Multithreaded processing of a CSV file
by Corion (Patriarch) on May 13, 2015 at 07:02 UTC

    No, don't use threads for that.

    You should restructure your program in a way that it first issues all DNS requests for all domains and after that checks the DNS responses it receives from the resolver. This could be done with the ->bgsend method if you want to keep on using Net::DNS::Resolver. If you want to use AnyEvent::DNS, a combination of $cv->begin and that module should also enable you to fire off the DNS requests in one go.

Re: Multithreaded processing of a CSV file
by afoken (Chancellor) on May 13, 2015 at 07:04 UTC

    Your problem of resolving domain names has nothing to do with the fact that you read your data from a CSV file, consider editing the title of your posting.

    I think the usual approach is to create a limited number of worker threads and make each of the workers resolve one domain at a time. The main thread tells the worker threads which domain to resolve next.

    Do you know Parallel::ForkManager? It uses separate processes instead of threads, but the examples in the documentation look promising for your problem.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Multithreaded processing of a CSV file
by QM (Parson) on May 13, 2015 at 08:42 UTC
    I second Parallel::ForkManager. I've got a nifty little ssh tool that uses it, and has good control.

    If you need 1 second and not 2, you may have some problems (in general) with resolution.

    Finally, note that nslookup has an interactive mode for multiple lookups, which also takes command line options, such as -timeout=1. Using these options gives great results on a list of 50, even for non-existent hosts:

    time nslookup -timeout=1 -nosearch -norecurse -retry=0 -fail <<HERE host.that.exists host.that.doesnt ... last.one.honest exit HERE

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Multithreaded processing of a CSV file
by costanza (Initiate) on May 13, 2015 at 16:41 UTC
    Thought I responded but it doesn't look like it. This was my first post here, thanks very much for all of the quick and informative responses. I used Parallel::ForkManager which worked great for this task. Thanks again