Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a script that's using LWP to do very simple FTP gets and puts. The code for the FTP GET looks something like this:

#an FTP get my $ua; $ua = LWP::UserAgent->new(); $ua->agent("$0/0.1 ".$ua->agent); my $req = HTTP::Request->new(GET => "$url"); my $result = $ua->request($req);

And the code for an FTP PUT looks something like this:

#an FTP put #open the file and read the contents my $content; if (open DATA_READER, "$data_file") { $content = join ("", <DATA_READER>); close DATA_READER; } else { warn "UNABLE TO READ $data_file, upload will be empty! $!\n"; } my $ua; $ua = LWP::UserAgent->new(); $ua->agent("$0/0.1 ".$ua->agent); my $req = HTTP::Request->new('PUT',"$url",undef,"$content"); my $result = $ua->request($req);

The thing is, I really don't care about the file data itself--in fact, when I do the ftp get, I never actually write the file to disk; I'm just concerned about whether the file was transfered properly or not. Some of the files I have to fetch and put are rather large...around 100-250MB. With the above method, perl is taking up a huge chunk of memory to buffer the files. Can anyone suggest a way to reduce the memory footprint of this code? For instance, is there a way to feed the file to the FTP put without buffering it into memory first? Likewise, is there a way to flush the buffer in the FTP get as data is coming in? It would be best if I can do this with LWP instead of having to resort to using other modules.

Replies are listed 'Best First'.
Re: Using Less Memory for LWP/FTP
by Corion (Patriarch) on Dec 02, 2003 at 14:21 UTC

    The LWP::UserAgent documentation tells me of two special parameters to get():

    :content_file => $filename :content_cb => \&callback
    The :content_file parameter is for directly saving the content of the response to a file, the :content_cb parameter is for passing the contents directly to a supplied callback.

    You should also take a look at the mirror() and getstore() methods, which might be suitable for your tasks as well.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: Using Less Memory for LWP/FTP
by iburrell (Chaplain) on Dec 02, 2003 at 21:27 UTC

    Look at mirror and the second argument to request in LWP::UserAgent. The second argument controls where the response goes. If it is a scalar, it is used as a filename. If it is a code references, the callback is called with the blocks of data. For your application, your subroutine can validate and then throw away the dat. Also, look at LWP::Protocol::collect

    my $ua = LWP::UserAgent->new(); my $request = HTTP::Request->new('PUT', $url, undef, $content"); my $response = $ua->request($request, \&check_response);
    With Net::FTP, you will have to read from the data socket yourself. The retr command will start the download and return the socket for the data connection. Your code can then read the data and throw it away.

    For uplaods, you will have to do things differently. But both LWP and Net::FTP will read from local files for uploads and don't need to hold the entire file in memory.

      >But both LWP and Net::FTP will read from local
      >files for uploads and don't need to hold the
      >entire file in memory.

      How is this done? Do I need to pass a filehandle or a filename insto the request or something?
Re: Using Less Memory for LWP/FTP
by Art_XIV (Hermit) on Dec 02, 2003 at 14:14 UTC

    I don't have any ideas about how to reduce your buffering problems with LWP.

    I'm curious about why you didn't/can't use Net::FTP, which is all about file-slinging, as opposed to LWP, which is mostly about content.

    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"