in reply to Retrieving web pages with the LWP::UserAgent

Could be any number of things. My two best guesses would be:

Another useful tip in situations like this is to install Firefox's LiveHTTPHeaders extension and to see exactly what the HTTP interaction is. You might be missing important headers.

--
<http://dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

  • Comment on Re: Retrieving web pages with the LWP::UserAgent

Replies are listed 'Best First'.
Re^2: Retrieving web pages with the LWP::UserAgent
by bart (Canon) on Sep 07, 2006 at 09:28 UTC
    Your second idea, about the session ID, was one worth pursuing. So I tried the URL manually, and I got a search page. I tried removing the "session ID" and I got a page with just 2 links: to a plain search page, and to an advanced search page. Apparently it's the latter the OP has been using, and its canonical URL is http://www.stat-usa.gov/nct_all.nsf/advSearch.

    And when I looked in this page's source, the form's action attribute was /nct_all.nsf/2d58b7a34bbaa3838525703f004f804e?CreateDocument: the exact same strange weird ID. So no, apparently it's not variable, but likely, generated by their web site creation tool.

    Do note the part after the question mark: "CreateDocument". I propose the OP tries it using POST with this part appended — and obviously, this wouldn't work with GET.

    I did try the OP's code as posted at this time, with just this changed (and the broken up words reassembled), and it works for me.

Re^2: Retrieving web pages with the LWP::UserAgent
by mrguy123 (Hermit) on Sep 06, 2006 at 13:30 UTC
    Sorry about the 'GET', it should have been 'POST', althought the result is the same.
    I tried it with newer session ids, and got the same result.
    I will use your advice for the HTTP headers.
    Do you know if there is another way that a website stores info besides cookies and session IDs?