Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Using LWP::UserAgent to pull down some webpages. One particular webpage will not let me grab it because it insists that I need Javascript enabled in order to view the page. Anyway I can "fake" this via LWP::UserAgent or some other means??

Replies are listed 'Best First'.
Re: LWP & Javascript
by dws (Chancellor) on Jul 25, 2002 at 04:04 UTC
    One particular webpage will not let me grab it because it insists that I need Javascript enabled in order to view the page.

    You're going to need to reverse engineer the mechanism by which the server is determining that you don't have Javascript enabled. It shouldn't be too difficult. The page immediately beforehand is probably trying to do something like setting a cookie from withing a <script> block.

    Once you know, you'll be able to work around it.

      Another, possibly more common, and naive, mechanism would be to simply sniff the user agent and compare to a browscap (Browser Capabilities) database. In this case passing the user agent header from a known JavaScript capable browser ought to work.

      --
      perl -pew "s/\b;([mnst])/'$1/g"

      Thanks for the reply dws but no dice. I'm already accepting cookies:
      $ua->cookie_jar(HTTP::Cookies->new(file => "cookies.yum", autosave => 1));
      Here's some more detail. I'm attempting to submit a form and "scrape" the results. Problem being that the webpage uses Javascript form validation and even uses Javascript to do the "submit". I'm really scratching my head here. Here's a snippet of the actual code:
      my $queryStr = "http://track.airborne.com/TrackByNbr.asp?txtTrackNbrs= +$trackNo&hdnTrackMode=nbr&hdnPostType=init&hdnRefPage=0&hdnSent=false +"; my $ua = new LWP::UserAgent; my $agent = "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"; $ua->agent($agent); $ua->cookie_jar(HTTP::Cookies->new(file => "cookies.yum", autosave => 1)); my $req = new HTTP::Request POST=>$queryStr; $req->content_type('application/x-www-form-urlencoded'); my $res = $ua->request($req); if ($res->is_success) { my $response = $res->content; open (HTMLDUMP, ">$HTMLFile") || die "Could not create $HTMLFile - $!\n"; print HTMLDUMP $response; close (HTMLDUMP); } else { die "Something bad happened...\n"; }
        no dice. I'm already accepting cookies

        Yes, but unless you're executing the Javascript that comes back with a page, you might not be sending cookies that are created from within <script> tags. That is a standard technique that sites use to determine whether the browser had Javascript enabled. A cookie jar isn't going to do you any good in this case.

Re: LWP & Javascript
by mitd (Curate) on Jul 25, 2002 at 08:18 UTC
    Read! dws's responses he has handed you the answer.
    At the site you are interested in there are/is pages/page that are the only access points to the page you are trying to get.

    Those pages/page are testing for javascript and setting a cookie or simply using javascript to set the cookie. Either way no JS no cookie... no cookie no page of interest.

    the simple thing to do is go to one of these 'access' pages, view the source and see what cookie is being set. then in your LWP script put the discovered cookie into your cookie jar.

    This is a fairly common practice but there are methods that are more complicated but can still be worked around. Try this way and let us know.

    Oh yea and don't forget to thank dws. Since all I did was shout his answer back at you :)

    mitd-Made in the Dark
    'Interactive! Paper tape is interactive!
    If you don't believe me I can show you my paper cut scars!'

Re: LWP & Javascript
by vek (Prior) on Jul 29, 2002 at 18:09 UTC
    I see you're trying to automatically track Airborne Express shipments. We've been doing that here for a while with LWP::UserAgent. Don't worry about the Javascript, make sure you are accepting cookies though.

    The crux of your problem is the query string you are sending - it looks like you viewed the HTML source to try and figure out how to construct it. That won't work, try this instead:
    http://track.airborne.com/TrackByNbr.asp?ShipmentNumber=$yourTrackNo
    -- vek --