inblosam has asked for the wisdom of the Perl Monks concerning the following question:

My obstacle now seems to be that the site (adwords.google.com) checks for a referring URL (or something else I can't figure out). How do I mimic that? Otherwise I am not able to get the page I need to read/parse, because I get the following message (on the first result):

Sorry, we're unable to process this request. There are two possible ca +uses for this error: You accessed this page via a bookmark instead of entering through the +AdWords Select homepage You have disabled cookies in your browser.

So I need to mimic the referring URL, and perhaps there are some other things google is doing to keep people from coming to their pages from a non-traditional way. This is what my script looks like right now:
#!/usr/lib/perl -w use strict; use LWP::UserAgent; use Crypt::SSLeay; use HTTP::Request::Common; use HTTP::Cookies; use LWP::Simple; use LWP::Debug qw(+); my $ua = LWP::UserAgent->new; $ua->cookie_jar(HTTP::Cookies->new(file => "cookie_jar", autosave => 1 +)); $ua->timeout(300); my $req = POST 'https://adwords.google.com/select/main', ['login.userid' => 'test@test.com', 'login.password' => 'testabc', 'cmd' => 'LoginValidation', 'login' => 'Login' ]; my $res = $ua->request($req); unless ($res->is_success) { print "Login Failed: : ". $res->status_line . "\n"; return 0; } my $b = $res->as_string; print "\n\n\nThe 1st result is:\n$b\n"; my $req = HTTP::Request->new(GET=>'https://adwords.google.com/select/m +ain?cmd=CampaignManagement&campaignid=0&timeperiod=simple&timeperiod. +simpletimeperiod=today'); my $res = $ua->request($req); my $c = $res->as_string; print "\n\n\nThe 2nd result is:\n$c\n";


Michael Jensen
michael at inshift.com
http://www.inshift.com

Replies are listed 'Best First'.
Re: Mimic referring URL in LWP?
by pjf (Curate) on Jun 20, 2002 at 01:32 UTC
    HTTP::Headers has a referer method (yes, it's mis-spelt in the HTTP standard, too). It does exactly what you expect in that it sets the Referer header in the HTTP request.

    You should be able to tweak the HTTP::Headers object that is used by your request:

    $req->headers->referer("http://www.perlmonks.org/");

    In fact, according to the HTTP::Message documentation, all unknown methods are fobbed off to the appropriate HTTP::Headers object, so the following also works:

    $req->referer("http://www.perlmonks.org/");

    Couldn't be much more simple than that. :)

    Paul Fenwick
    Perl Training Australia

      I tried the "referer" suggestions, but doesn't seem to help. I placed it after both my post and my get, like this:
      my $req = POST 'https://adwords.google.com/select/main', ['login.userid' => 'test@test.com', 'login.password' => 'testabc', 'cmd' => 'LoginValidation', 'login' => 'Login' ]; $req->referer("https://adwords.google.com/select/"); my $req = HTTP::Request->new(GET=>'https://adwords.google.com/select/m +ain?cmd=CampaignManagement&campaignid=0&timeperiod=simple&timeperiod. +simpletimeperiod=today'); $req->referer("https://adwords.google.com/select/main");
      Is the referrer page the one that brought me to the page I am posting to or getting (that was my assumption)?
      Also, in my headers returned from my "Post" I get this:
      Client-SSL-Warning: Peer certificate not verified

      I didn't know if that was an issue or not. Something they are looking for I am not sending them, but I can't figure it out. Also, I realized the response from my "Get" is not the same as the first, which seems a little odd:
      Your session has expired. Please return to the AdWords Select homepage + and login again. (This is a security precaution to prevent someone from gaining access +to your account if you forget to log-out.)
      Is the log in working then? But somehow my cookie shows the session is timed out? Thanks!

      Michael Jensen
      michael at inshift.com
      http://www.inshift.com
        It seems fairly clear that Google either does not expect or does not want automated scripts to access this particular facility. You may find it worthwhile asking them if they have a more programmer-friendly method of accessing the information that you're after, as that will save a lot of hassle trying to reverse-engineer the whole process.

        Judging from the session-expired message, I would judge that google requires you to go through the whole login process to get a valid cookie, as even if these don't expire on the client, they do on their server.

        It's very common for sites to stop accepting old cookies, particularly when money's involved. They want to avoid the situation of having cookies stored on a public computer, and a potential third-party accessing the content in question.

        If you were dealing with un-encrypted HTTP sessions, then you could use tcpdump/ethereal to log and examine what's happening "under-the-hood". However, since your connection is proceeding via SSL, that's not an easy option.

        Paul Fenwick
        Perl Training Australia

Re: Mimic referring URL in LWP?
by ehdonhon (Curate) on Jun 20, 2002 at 01:25 UTC
    Here's a snippit from a recent script I had to write. It shows how you can supply referer information. Hope this is what you are looking for.
    my $req = new HTTP::Request GET => $location; $req->referer($referer);
Re: Mimic referring URL in LWP?
by LiTinOveWeedle (Scribe) on Jun 20, 2002 at 12:53 UTC
    Hi, you can also use something like:

    $res = $ua->request(POST 'https://adwords.google.com/select/main', ['login.userid' => 'test@test.com', 'login.password' => 'testabc', 'cmd' => 'LoginValidation', 'login' => 'Login' ], Referer => $referer );

    Also is good point for things like this, to set cookie jar:

    my $cookie_jar = HTTP::Cookies->new; my $ua = LWP::UserAgent->new; $ua->cookie_jar($cookie_jar);

    Hope that this will help you.
    Regards
    Li Tin O've Weedle
    mad Tsort's philosopher

      Thanks for the suggestion. It didn't seem to make a difference, unfortunately. Could there be something else that I need to set that google may be checking for?

      Michael Jensen
      michael at inshift.com
      http://www.inshift.com