Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to fetch info from a ftp site and find out if data has changed. I have done this with no problems on html pages but cant seem to fetch the page source code from the ftp file.

When I fetch the contents of the ftp symantec anti virus info I get the unix filesystem listing instead of the html page source code. Anything different I need to do to fetch the contents of a ftp site?
use LWP::Simple; use warnings; use strict; my $url = 'ftp://ftp.symantec.com/public/english_us_canada/antivirus_d +efinitions/norton_antivirus_corp/'; my $content = get($url); print "$content\n"; #prints unix filesytem listing
Output:
total 230628 -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 987053 Jan 23 15: +53 20040123-007-i32-1.exe -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 1325979 Jan 23 15: +53 20040123-007-i32-2.zip -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 1137726 Jan 23 15: +53 20040123-007-i32-3.zip -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 1137848 Jan 23 15: +53 20040123-007-i32-4.zip -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 4599122 Jan 23 15: +53 20040123-007-i32.exe -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 3109802 Jan 23 15: +53 20040123-007-o32.zip -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 8658502 Jan 23 15: +53 20040123-007-unix.sh -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 7637800 Jan 23 15: +53 20040123-007-x86.exe -rwxrwxr-x 1 hevasymantec-ftp4 hevasymantec-ftp4 987048 Jan 26 18: +38 20040126-021-i32-1.exe .....more lines etc...
I was hoping to get the html page version something like this:
<HTML><HEAD><TITLE>Directory listing for /public/english_us_canada/ant +ivirus_definitions/norton_antivirus_corp/</TITLE></HEAD><BODY><h2>Cur +rent directory is /public/english_us_canada/antivirus_definitions/nor +ton_antivirus_corp/</h2><BR><HR><DD><A HREF="ftp://ftp.symantec.com:2 +1/public/english_us_canada/antivirus_definitions/"><IMG SRC="http://1 +64.214.7.246:80/-http-gw-internal-/menu.gif">&nbsp;..</A></DD><DD><A +HREF="ftp://ftp.symantec.com:21/public/english_us_canada/antivirus_de +finitions/norton_antivirus_corp/20040123-007-i32-1.exe "><IMG SRC="http://164.214.7.246:80/-http-gw-internal-/blank.gif">&nbs +p;20040123-007-i32-1.exe </A></DD><DD><A HREF="ftp://ftp.symantec.com:21/public/english_us_cana +da/antivirus_definitions/norton_antivirus_corp/20040123-007-i32-2.zip "><IMG SRC="http://164.214.7.246:80/-http-gw-internal-/binary.gif">&nb +sp;20040123-007-i32-2.zip </A></DD><DD><A HREF="ftp://ftp.symantec.com:21/public/english_us_cana +da/antivirus_definitions/norton_antivirus_corp/20040123-007-i32-3.zip ...etc.... </BODY> </HTML>

Replies are listed 'Best First'.
Re: fetching ftp site info
by stvn (Monsignor) on Jan 27, 2004 at 15:45 UTC

    You are getting a directory listing because that is what you are asking for. (the end of your path is a directory, not a file)

    my $url = 'ftp://ftp.symantec.com/public/english_us_canada/antivirus_d +efinitions/norton_antivirus_corp/';
    You will most likely not be able to get the HTML page you seek, as it seems to be a dynamically generated directory listing by either the ftp server or your browser (I dont know what kind of FTP server symantec uses, and what OS and browser version you used to get that HTML with so I couldnt really say, although I am leaning towards Windows and IE which means it was your browser is generating it). (Notice the TITLE)
    <TITLE>Directory listing for /public/english_us_canada/antivirus_defin +itions/norton_antivirus_corp/</TITLE>
    Just changing ftp:// to http://, I suspect would not work, since the ftp.symantec.com is likely not going to respond to HTTP requests, and my (strong) suspicions are that the your browser is generating the HTML.

    You should try Net::FTP and check modification dates on the files you are wondering about. Since I don't know the criteria you want to use to test how old or new these files are, I can't help you there, but you should be able to find plenty of date comparison and manipulation modules on CPAN which will surely suit your needs.

    -stvn
      thanks to both of you for leading me in the right direction!
Re: fetching ftp site info
by b10m (Vicar) on Jan 27, 2004 at 14:05 UTC

    I don't really understand the problem and would choose the UNIX file listing over some HTML any day, for it's way easier to parse.

    Couldn't you get the time of each file and figure out wheter it is new enough? You could use Date::Manip for example, to check wheter it's say "older than one week ago".

    Could you please state exactly what your problem is, for it is unclear to me as of yet.

    --
    b10m
      Sorry about not being clear. When I fetch the ftp site and put in a local text file I get the Unix file system listing. I want to fetch the ftp site into my local text file and get the html source code instead.
        I believe the reason is such --

        Because you will get the html source code only when you use an html client to view an ftp site. FTP by itself is doing what it is supposed to do... give you a listing. You are using a screwdriver to hammer a nail, and then getting surprised that it is going straight in instead of turning.

        In your code try using http:// protocol instead of ftp and you should get the html source code.

        The html source is created by your client. You can do a couple of things (not involving perl) like:
        $ wget -O temp.html 'ftp://ftp.symantec.com/public/english_us_canada/a +ntivirus_definitions/norton_antivirus_corp/'
        or
        $ lynx -source 'ftp://ftp.symantec.com/public/english_us_canada/antivi +rus_definitions/norton_antivirus_corp/'>temp.html
        and they will create the html source for you.

        I tried using lwp-request but it just gave me the unix listing.

        --

        flounder