in reply to what modules you recommend for downloading hundreds of URLs per second in parallel?

You could use pure Sockets, see Fetching HTML Pages with Sockets for the basic idea. Here is a working code bit, that you can remove the extra fluff from. It probably has some code that needs improving too.
#!/usr/bin/perl use warnings; use strict; use Socket; #dosn't work well for images, but you can fix that my $url = "http://zentara.net/index.html"; my $infile = $url; $infile =~ tr#\/#-#; print $infile; my $host = "zentara.net"; $| = 1; my $start = times; my ( $iaddr, $paddr, $proto ); $iaddr = inet_aton($host); #$iaddr = ( gethostbyname($host) )[4]; $paddr = sockaddr_in( 80, $iaddr ); $proto = getprotobyname('tcp'); unless ( socket( SOCK, PF_INET, SOCK_STREAM, $proto ) ) { die "ERROR Dude: getUrl socket: $!"; } unless ( connect( SOCK, $paddr ) ) { die "getUrl connect: $!\n"; } my @head = ( "GET $url HTTP/1.0", #maybe better to use 1.0, instead of 1.1 for + "no keep-alive" ?? "User-Agent: Mozilla/4.78 [en] (X11; U; Safemode Linux i386)", "Pragma: no-cache", "Host: $host", "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, imag +e/png, */*", "Accept-Language: en" ); push( @head, "", "" ); #Build Header and print to socket my $header = join( "\015\012", @head ); print "sending request\n$header\n\n"; select SOCK; $| = 1; binmode SOCK; print SOCK $header; my $body = ''; open (FH,"> $infile") or warn "$!\n"; while (<SOCK>) { my $data = $_; $data =~ s/[\r\n\t]+$//s; $data =~ s/^[\r\n\t]+//s; last if $data =~ /^0$/s; my $len = length($data); print STDOUT "len:$len\n"; $body .= $data; last if $data =~ /\<\/html\>$/is; if ( $data =~ /\<\/body\>$/is ) { $body .= qq|</html>|; last; } print FH $data; } unless ( close(SOCK) ) { return ("getUrl close: $!"); } select STDOUT; close SOCK; close FH; my $end = times; my $diff = $end = $start; print "Took $diff to access page\n";

I'm not really a human, but I play one on earth CandyGram for Mongo
  • Comment on Re: what modules you recommend for downloading hundreds of URLs per second in parallel?
  • Download Code

Replies are listed 'Best First'.
A reply falls below the community's threshold of quality. You may see it by logging in.