Retrieve a CGI page

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Retrieve a CGI page by dani++ (Sexton) on Aug 31, 2001 at 14:58 UTC
I've written a fairly sophisticated html spider using perl, lynx and tcsh (as glue), all the pages accessed are CGIs and it works as advertised. Have you tried to use 'lynx --source' or 'lynx --dump' as suggested? I've refrained from using LWP as the target CGI system required cookies, sessions and full browser support. Moreover, Lynx has a limited script option '-cmd_script=<script file>' that you can use to program what it does (download files, etc). Use '-cmd_log=<script log file>' to learn the syntax of the script files. In my case I first download the pages, use perl to parse and analyse them, build a custom lynx script to retrieve exactly the data I want and run lynx again with the generated script. dani++	[reply]
Re: Retrieve a CGI page by Beatnik (Parson) on Aug 31, 2001 at 14:29 UTC
Retrieving a CGI generated page normally isnt different from retrieving a static page. Check QandASection: HTTP and FTP clients for several examples (since this question has been asked dozens of times before). For the record: YES, LWP::Simple would be a way, but shell calls to wget or lynx --dump are also pretty common. Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply]
Re: Retrieve a CGI page by George_Sherston (Vicar) on Aug 31, 2001 at 14:07 UTC
This is a chunk I bodged together a while ago which might help. It gets the page generated by the url in `$location` and saves it on my local machine in `$file`. Please be warned that I don't really know what IO::Socket does, so your mileage may vary (or even the wheels may fall off). `use IO::Socket; use URI; my $url = new URI( $location ); my $host = $url->host; my $port = $url->port \|\| 80; my $path = $url->path \|\| "/"; my $query = $url->query; $path .= '?' . $query; my $socket = new IO::Socket::INET (PeerAddr => $host, PeerPort => $por +t, Proto => 'tcp') or die "cannot connect\n"; $socket->autoflush (1); print $socket "GET $path HTTP/1.1\n", "Host: $host\n\n"; open (SAVE, ">$file"); print SAVE while (<$socket>); $socket->close; close SAVE;` [download] § George Sherston	[reply] [d/l] [select]