Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to retrieve a page that is cgi generated, the URL is something like: http://www.wsvr.com/cgi-bin/script.pl?arg1=blah&arg2=arb I've tried wget & lynx, they don't retrieve the page that would appear if I put the URL in my www browser. (the script spits back a .CSV file). How best to retrieve it? LWP? Ta'

Replies are listed 'Best First'.
Re: Retrieve a CGI page
by dani++ (Sexton) on Aug 31, 2001 at 14:58 UTC
    I've written a fairly sophisticated html spider using perl, lynx and tcsh (as glue), all the pages accessed are CGIs and it works as advertised. Have you tried to use 'lynx --source' or 'lynx --dump' as suggested?

    I've refrained from using LWP as the target CGI system required cookies, sessions and full browser support.

    Moreover, Lynx has a limited script option '-cmd_script=<script file>' that you can use to program what it does (download files, etc). Use '-cmd_log=<script log file>' to learn the syntax of the script files.

    In my case I first download the pages, use perl to parse and analyse them, build a custom lynx script to retrieve exactly the data I want and run lynx again with the generated script.

    dani++

Re: Retrieve a CGI page
by Beatnik (Parson) on Aug 31, 2001 at 14:29 UTC
    Retrieving a CGI generated page normally isnt different from retrieving a static page. Check QandASection: HTTP and FTP clients for several examples (since this question has been asked dozens of times before).
    For the record: YES, LWP::Simple would be a way, but shell calls to wget or lynx --dump are also pretty common.

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
Re: Retrieve a CGI page
by George_Sherston (Vicar) on Aug 31, 2001 at 14:07 UTC
    This is a chunk I bodged together a while ago which might help. It gets the page generated by the url in $location and saves it on my local machine in $file. Please be warned that I don't really know what IO::Socket does, so your mileage may vary (or even the wheels may fall off).
    use IO::Socket; use URI; my $url = new URI( $location ); my $host = $url->host; my $port = $url->port || 80; my $path = $url->path || "/"; my $query = $url->query; $path .= '?' . $query; my $socket = new IO::Socket::INET (PeerAddr => $host, PeerPort => $por +t, Proto => 'tcp') or die "cannot connect\n"; $socket->autoflush (1); print $socket "GET $path HTTP/1.1\n", "Host: $host\n\n"; open (SAVE, ">$file"); print SAVE while (<$socket>); $socket->close; close SAVE;


    § George Sherston