Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Unable to complete download with Net::FTP

by Itatsumaki (Friar)
on Nov 28, 2003 at 21:45 UTC ( [id://310789]=perlquestion: print w/replies, xml ) Need Help??

Itatsumaki has asked for the wisdom of the Perl Monks concerning the following question:

I found a piece of old code that I had written that goes to an FTP site and retrieves a file via LWP::Simple:

my $data = get('ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL_tmpl.gz') +; my $outfile = '>GO_TERMS.CSV'; if (!$data) { exit(); } open(OUT, '>LL_tmpl.gz'); binmode OUT; print OUT $data; close(OUT);

I thought: hey, that's silly because it loads the whole file into a variable. Why not rewrite it with Net::FTP? That should be shorter, clearer code, as well as much faster.

use Net::FTP; my $ftp = Net::FTP->new('ftp.ncbi.nih.gov', Debug=>0); $ftp->login('anonymous', 'anon@anon.com'); $ftp->cwd('/refseq/LocusLink/'); $ftp->type('binary'); $ftp->get('LL_tmpl.gz'); $ftp->quit();

The problem is, when I run the file downloaded via Net::FTP through a gzip -t it indicates a corrupt archive. When I get it from LWP::Simple, the archive seems valid. I've reproduced this on a couple of machines, but they are all behind the same router: can anyone validate this from other machines? Any ideas what's causing the FTP download to be invalid?

-Tats

Update: Warning! The downloaded file is rather large ~30 MB. (thanks for point it out b10m)

Replies are listed 'Best First'.
Re: Unable to complete download with Net::FTP
by holo (Monk) on Nov 28, 2003 at 22:00 UTC

    I would check the return code of each $ftp->... call just in case something wrong is happening:

    use strict; use warnings 'all'; use Net::FTP; my $host = 'ftp.ncbi.nih.gov'; my $ftp = Net::FTP->new($host, Debug=>0) or die "Cannot connect to $host ", $ftp->message; $ftp->login('anonymous', 'anon@anon.com') or die "Cannot login ", $ftp->message; $ftp->cwd('/refseq/LocusLink/') or die "Cannot cwd ", $ftp->message; $ftp->type('binary') or die "BINARY failed ", $ftp->message; $ftp->get('LL_tmpl.gz') or die "Cannot get file ", $ftp->message; $ftp->quit; # well ... almost every call ;)

    Take a look at the Net::FTP docs.

      Ahh, good catch. It fails the $ftp->type() line, which is probably the source of my problem. I was able to fix it by using $ftp->binary() instead of $ftp->type(). Thanks!

Re: Unable to complete download with Net::FTP
by bart (Canon) on Nov 29, 2003 at 00:47 UTC
    I thought: hey, that's silly because it loads the whole file into a variable.
    If you use LWP::Simple's getstore() function instead, it'll save the file directly to disk, in chunks of a few k.

    Why not rewrite it with Net::FTP? That should be shorter, clearer code, as well as much faster.
    Huh?!?

      Given that I hadn't noticed the getstore function, I found the FTP-based implementation above to be much superior. Three reasons why:

      1. It's shorter code: fewer lines and characters
      2. It's clearer code: it avoids the intermediation of a variable into the download process
      3. It's faster code: I noticed this empirically, and I imagine the difference there is from saving everything into one 30MB variable.

      I guess you think the LWP version is clearer to read? After I get some sleep I'll benchmark the three approaches (Net::FTP, LWP::Simple::get(), and LWP::Simple::getstore()) and see what shakes out there.

      -Tats

        After I get some sleep I'll benchmark the three approaches
        You could certainly do that, but I believe you would be better off continuing with the method you find easiest, clearest, and most suitable. The difference in execution time between the various methods is likely (almost guaranteed) to be negligible, whereas using a method/module you find unintuitive will slow you down.
        Your time is worth more than a few seconds of processor execution time.


        davis
        It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
        Update: Minor text edit; title change

        Here's the benchmark. I'd love some help interpreting it, because I don't know what to make of this. Visually, using an LWP get() used up the most memory, but I can't grok the huge difference in wall-clock time. Incidentally, to avoid spamming my favourite genomic-annotation provider I tested a much smaller file (about 10k). I don't think I could really run a a test with more than 10 iterations on any of the bigger files, so if FTP has a long connect lag at the front, a larger file might make it more competitive.

        The code:

        use strict; use Benchmark; use Net::FTP; use LWP::Simple; sub lwp_simple { my $data = get('ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL.out_x +l.gz'); my $outfile = '>GO_TERMS.CSV'; if (!$data) { } open(OUT, '>LL_tmpl.gz'); binmode OUT; print OUT $data; close(OUT); sleep 1; } sub net_ftp { my $ftp; if (!($ftp = Net::FTP->new('ftp.ncbi.nih.gov', Debug=>0))) { print "Couldn't log-in"; return; }; $ftp->login('anonymous', 'anon@anon.com'); $ftp->cwd('/refseq/LocusLink/'); $ftp->type('binary'); $ftp->get('LL.out_xl.gz'); $ftp->quit(); sleep 1; } sub lwp_getstore { my $url = 'ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL.out_xl.gz'; my $file = 'LL.out_xl.gz'; getstore($url, $file); sleep 1; } timethese(100, { 'LWP' => \&lwp_simple, 'FTP' => \&net_ftp, 'LWP-Store' => \&lwp_getstore } );

        The results:

        Benchmark: timing 100 iterations of FTP, LWP, LWP-Store... FTP: 4011 wallclock secs ( 2.31 usr + 2.68 sys = 5.00 CPU) @ 20.01/s (n=100) LWP: 933 wallclock secs ( 4.05 usr + 4.87 sys = 8.92 CPU) @ 11.21/s (n=100) LWP-Store: 340 wallclock secs ( 4.11 usr + 3.70 sys = 7.81 CPU) @ 12.80/s (n=100)
        -Tats
Re: Unable to complete download with Net::FTP
by pg (Canon) on Nov 28, 2003 at 21:58 UTC

    Check return code for each step.

      I thought binmode was a property of a file-handle, and for ftp I figured the line:

      $ftp->type('binary');

      was doing that? Where/what should I binmode? Also, if I look at the downloaded file (e.g. type LL_tmpl.gz) it is definitely binary, not ascii.

      Boy, do I hate when people change their posts without specifying what the update is ... wasn't your original post "use binmode"? Nothing more nothing less? Why did you change it?

      Update: If you didn't change it, Itatsumaki's comment would actually make sense ...

      --
      B10m

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://310789]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-03-28 21:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found