ronin78 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,

I have looked around for a solution to this, but I cannot seem to find anything about it.

I would like to know if it is possible to read a file, over FTP (this is a must), line-by-line. The Net::FTP package will allow me to "get" entire files, and the "read" function does not appear to have a parameter for a file. There are other packages, like File::Remote, which allow line-by-line reading of remote files, but do not support FTP.

The problem that I am trying to solve is this. I have a very large list of files (indexed by company and year). Each company and year combination has multiple files associated with it. However, I know that I only want one of these files; I just can't tell which one from the listing. In order to tell if a particular file is the correct one, I need to read the header of the file (the files are .txt, but have HTML information in them which includes a header).

I would like to write a script which iterates over each filename in the list and retrieves each one from the FTP server one line at a time. I can then check each line to see if it contains the header information that I need, and I can exit the read if it does. Alternately, if line-by-line is not doable, I could extract a fixed number of lines from each file: the header does not always contain the same number of lines, but I could pick a threshold which would work.

These files are individually quite large, so I only want to download the ones that I need. Otherwise, I would download all the files and delete the ones that did not match.

Any help would be appreciated!

Matt

Replies are listed 'Best First'.
Re: Read part of a file over FTP
by BrowserUk (Patriarch) on May 27, 2011 at 18:29 UTC

    This uses Net::FTP to fetch the first two hundred bytes of a 180k file:

    #! perl -slw use strict; use Net::FTP; my $ftp = Net::FTP->new( 'ftp.software.ibm.com' ); $ftp->login( 'anonymous', 'anonymous@' ); $ftp->cwd( '/ftp' ); $ftp->ascii; my $xfr = $ftp->retr( '00_Catalog' ); my $buf; $xfr->read( $buf, 200 ); print $buf;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Yes, this is what I wanted! Thanks very much!

      As a follow up question, the buffer now returns the first file in my loop, and I get the text that I want, but fails on the second with the error: Can't call method "read" on an undefined value. However, I have verified that the new filename is correct, so I assume that there must be some problem with reinitializing the dataconn object?

      Is there a recommended way to clear a retr() so that a new one can be called? Or is that not even likely to be my problem?

      Relevant code is below: The while loop is on a MySQL query return. Note that I haven't done anything with the buffer yet.
      while (@results = $filequery->fetchrow()) { $filename="/".$filename; print "$filename\n"; $xfr = $ftp->retr($filename); $xfr->read($header,1400); print "$header\n"; }
        Note, for the above code, that I have tried the following things to close the retr():
        $ftp->close(); $xfr->close();
        But no joy. I'm just stabbing in the dark, though, so that's not really surprising.