The following isn't really the answer to your question, but when I read the title, I thought of something else. I might as well reply, since you never know who will hit this page from a search engine, and maybe this will be useful.
You see, to request a page from a web site, at the bare minimum you have to open a socket to the port 80 of the remote machine, and send something like:
GET / HTTP/1.0
... with an extra newline to tell the remote server you aren't qualifying the query with other information (and I'm glossing over the definition of a newline...). Still, that is sufficient for a basic page from a basic server.
That said, there are times when you come across a server and this is not enough. Maybe it insists on a particular version of Microsoft IE or Netscape Navigator (these days, that's getting rarer). Or some other piece of information, because the server is trying to distinguish between programs (such as those that one might write in Perl), and humans sitting behind browsers clicking on buttons.
When this happens, you really do have to "fake" a web browser in Perl. To do so, you have to send more information along with your request, which hopefully will slide under the radar, and the server will think it's talking to just another user, clicking away in a browser.
The last time I had to do this, according to the date of the script was 1999-11-04. I no longer recall what I needed this for, but I did name the script sneakyget :)
#! /usr/local/bin/perl -w
use strict;
use HTTP::Request;
use LWP::UserAgent;
$|++;
my $URL = shift or die "no url on command line\n";
my $ua = new LWP::UserAgent;
$ua->agent('Mozilla/4.7 [en] (Win95; I)');
my $r = new HTTP::Request;
$r->header( Accept => [qw{image/gif image/x-xbitmap image/jpeg image/p
+jpeg image/png */*}],
Accept_Charset => [qw{iso-8859-1 * utf-8}],
Accept_Encoding => 'gzip',
Accept_Language => 'en',
Connection => 'Keep-Alive',
);
$r->method( 'GET' );
$r->uri( $URL );
my $res = $ua->request( $r );
print $res->content;
warn $res->code;
This was sufficient at the time for my nefarious purposes. Of course, these days one might have to update it a little with a more current OS and browser. The main point is that you can indeed "fake" a web browser with Perl.
To find out what a browser sends to a server in its headers along with the GET/POST/whatever request, the following CGI script can come in handy. It just echos back the information the CGI environment has at its disposal.
#! /usr/local/bin/perl -w
use strict;
use CGI;
my $q = new CGI;
print $q->header(),
$q->start_html( 'session echo' ),
$q->h1( 'session echo' ),
$q->table(
$q->TR( { -valign=>'top' },
[map { $q->th( {-align=>'right'}, $_ ) . $q->td( $ENV{$_}
+) } sort keys %ENV]
)
),
$q->end_html();
With a bit of experimentation you can tell what different browsers send. I used something like this at the time to build the above script.
If you need to play around with a functional implementation of this CGI script, you can try it out here on jcwren's perlmonk server.
Finally, for reference (I just might come back here myself some day), the two main RFCs covering HTTP are RFC 1945 for version 1.0 and RFC 2616 for 1.1. Have fun. |