Here's the benchmark. I'd love some help interpreting it, because I don't know what to make of this. Visually, using an LWP get() used up the most memory, but I can't grok the huge difference in wall-clock time. Incidentally, to avoid spamming my favourite genomic-annotation provider I tested a much smaller file (about 10k). I don't think I could really run a a test with more than 10 iterations on any of the bigger files, so if FTP has a long connect lag at the front, a larger file might make it more competitive.
The code:
use strict;
use Benchmark;
use Net::FTP;
use LWP::Simple;
sub lwp_simple {
my $data = get('ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL.out_x
+l.gz');
my $outfile = '>GO_TERMS.CSV';
if (!$data) { }
open(OUT, '>LL_tmpl.gz');
binmode OUT;
print OUT $data;
close(OUT);
sleep 1;
}
sub net_ftp {
my $ftp;
if (!($ftp = Net::FTP->new('ftp.ncbi.nih.gov', Debug=>0))) {
print "Couldn't log-in";
return;
};
$ftp->login('anonymous', 'anon@anon.com');
$ftp->cwd('/refseq/LocusLink/');
$ftp->type('binary');
$ftp->get('LL.out_xl.gz');
$ftp->quit();
sleep 1;
}
sub lwp_getstore {
my $url = 'ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL.out_xl.gz';
my $file = 'LL.out_xl.gz';
getstore($url, $file);
sleep 1;
}
timethese(100, {
'LWP' => \&lwp_simple,
'FTP' => \&net_ftp,
'LWP-Store' => \&lwp_getstore
}
);
The results:
Benchmark: timing 100 iterations of FTP, LWP, LWP-Store...
FTP: 4011 wallclock secs
( 2.31 usr + 2.68 sys = 5.00 CPU) @ 20.01/s (n=100)
LWP: 933 wallclock secs
( 4.05 usr + 4.87 sys = 8.92 CPU) @ 11.21/s (n=100)
LWP-Store: 340 wallclock secs
( 4.11 usr + 3.70 sys = 7.81 CPU) @ 12.80/s (n=100)
-Tats |