The following isn't really the answer to your question, but when I read the title, I thought of something else. I might as well reply, since you never know who will hit this page from a search engine, and maybe this will be useful.

You see, to request a page from a web site, at the bare minimum you have to open a socket to the port 80 of the remote machine, and send something like:

GET / HTTP/1.0

... with an extra newline to tell the remote server you aren't qualifying the query with other information (and I'm glossing over the definition of a newline...). Still, that is sufficient for a basic page from a basic server.

That said, there are times when you come across a server and this is not enough. Maybe it insists on a particular version of Microsoft IE or Netscape Navigator (these days, that's getting rarer). Or some other piece of information, because the server is trying to distinguish between programs (such as those that one might write in Perl), and humans sitting behind browsers clicking on buttons.

When this happens, you really do have to "fake" a web browser in Perl. To do so, you have to send more information along with your request, which hopefully will slide under the radar, and the server will think it's talking to just another user, clicking away in a browser.

The last time I had to do this, according to the date of the script was 1999-11-04. I no longer recall what I needed this for, but I did name the script sneakyget :)

#! /usr/local/bin/perl -w use strict; use HTTP::Request; use LWP::UserAgent; $|++; my $URL = shift or die "no url on command line\n"; my $ua = new LWP::UserAgent; $ua->agent('Mozilla/4.7 [en] (Win95; I)'); my $r = new HTTP::Request; $r->header( Accept => [qw{image/gif image/x-xbitmap image/jpeg image/p +jpeg image/png */*}], Accept_Charset => [qw{iso-8859-1 * utf-8}], Accept_Encoding => 'gzip', Accept_Language => 'en', Connection => 'Keep-Alive', ); $r->method( 'GET' ); $r->uri( $URL ); my $res = $ua->request( $r ); print $res->content; warn $res->code;

This was sufficient at the time for my nefarious purposes. Of course, these days one might have to update it a little with a more current OS and browser. The main point is that you can indeed "fake" a web browser with Perl.

To find out what a browser sends to a server in its headers along with the GET/POST/whatever request, the following CGI script can come in handy. It just echos back the information the CGI environment has at its disposal.

#! /usr/local/bin/perl -w use strict; use CGI; my $q = new CGI; print $q->header(), $q->start_html( 'session echo' ), $q->h1( 'session echo' ), $q->table( $q->TR( { -valign=>'top' }, [map { $q->th( {-align=>'right'}, $_ ) . $q->td( $ENV{$_} +) } sort keys %ENV] ) ), $q->end_html();

With a bit of experimentation you can tell what different browsers send. I used something like this at the time to build the above script.

If you need to play around with a functional implementation of this CGI script, you can try it out here on jcwren's perlmonk server.

Finally, for reference (I just might come back here myself some day), the two main RFCs covering HTTP are RFC 1945 for version 1.0 and RFC 2616 for 1.1. Have fun.


In reply to Re: How to "fake" web browser from Perl (and I mean /really/ fake) by grinder
in thread How to "fake" web browser from Perl by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.