Vautrin has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to scrape a web site. Whenever I use LWP::UserAgent to get anything beyond the first page I am shown a warning page telling me I need to turn on Javascript.'

So I figured, no problem, found the point in the HTML where the web page is redirected to a new URL. Oddly enough when I tried doing that I get a web page telling me that my browser doesn't support cookies.

So, I know that there is some black magic going on at this site. The problem is, I need a way to walk through what it's doing with Javascript and Cookies to see what's happening.

Is there any way to:

a) Dump a cookie jar in a human readable format -- an instance of HTTP::Cookies
b) Interface with the Mozilla javascript libraries perhaps, to get a web page that is whatever it needs to be
c) Figure out what's going on?

Thanks in advance,

Dan
  • Comment on Scraping a web page while taking into account the javascript

Replies are listed 'Best First'.
Re: Scraping a web page while taking into account the javascript
by Anonymous Monk on Jan 22, 2004 at 01:35 UTC
    a) cookies are already very human-readable (perl -MCGI::Cookie -le"die CGI::Cookie->new(-name=>'name',-value=>'value',-expires=>'+2d')" yields name=value; path=/; expires=Sat, 24-Jan-2004 01:34:00 GMT seems alwfully readable to me)

    b) Why yes, just use JavaScript or JavaScript::SpiderMonkey

    c) What's going on is you need to become a browser (or as smart as one). There have been a lot of Seekers Of Perl Wisdom posts about this (the easiest method is to use a browser and see what it does, the mimick accordingly).