in reply to HTTP GET without LWP

Many servers won't like getting a request that includes the 'http://hostname' part of the URL. Typically only proxy servers actually accept that. You might have better luck if you change:
m{^http://(.*?)/} #to m{^http://(.*?)(/.*)$}
...and then set $url = $2 so you only request the URI (/authenticate.cgi). Before anyone else lays into me, this is a very rough solution and doesn't necessarily take everything into account. This is really what LWP is designed for, as it will use RFC-compliant methods to parse the URL instead of this quick and dirty stuff.

Update: It may be a virtual server, in which case you also need to send the Host: header in your request, like this:
print $remote "GET $url HTTP/1.0\nHost: $host" . $BLANK;
I strongly suggest splitting the URI, too.

--isotope
http://www.skylab.org/~isotope/

Replies are listed 'Best First'.
(unfortunately...) Re (2): HTTP GET without LWP
by mwp (Hermit) on Jan 13, 2001 at 04:30 UTC
    That was my first thought, but breaking the URL into host and URI didn't solve the issue. If I figure anything else out, I'll post it here.

    Update: sutch seems to have hit the nail on the head. If you send "GET /authenticate.cgi HTTP/1.0" alone it errors out. The key is attaching "Host: login.gatorlink.ufl.edu" to the end of the request, before your $BLANK variable.

    while(my $url = shift @ARGV) { unless($url =~ m{^http://([A-Za-z0-9\.\-]+)/(.*)$}) { print "$0: invalid url: $url\n"; next; } my($host, $uri) = ($1, $2); my $remote = IO::Socket::INET->new(Proto => "tcp", PeerAddr => $host, PeerPort => "http(80)"); unless ($remote) { die "Cannot connect to http daemon on $host\n" } $remote->autoflush(1); print $remote "GET /$uri HTTP/1.0\nHost: $host" . $BLANK; print while(<$remote>); print "\n$sep"; close $remote; }
    Your end result might look something like that. You really should just use LWP. =)

    'kaboo

Servers not liking 'http://hostname'
by bbfu (Curate) on Jan 13, 2001 at 04:32 UTC

    Yes, I'd thought of that and tried it but it doesn't seem to work any better. =(

    All the pages I've tried it on accept the full URL but since you say many won't like it, I'll change it. It seems to work both ways for the ones that work. Unfortunately:

    $ ./gethttp 'http://login.gatorlink.ufl.edu/authenticate.cgi' HTTP/1.1 404 Not Found Date: Fri, 12 Jan 2001 23:27:55 GMT Server: Apache/1.3.6 (Unix) mod_perl/1.19 mod_ssl/2.2.8 OpenSSL/0.9.2b Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>404 Not Found</TITLE> </HEAD><BODY> <H1>Not Found</H1> The requested URL /authenticate.cgi was not found on this server.<P> </BODY></HTML>

    Thanks for your help, though!