Re: Retrieving web pages with the LWP::UserAgent

Could be any number of things. My two best guesses would be:

Maybe the form processer only accepts POSTS. Why not try POSTing the request instead.
The "2d58b7a34bbaa3838525703f004f804e" part of your URL looks like it might be a session ID. Perhaps that session has expired.

Another useful tip in situations like this is to install Firefox's LiveHTTPHeaders extension and to see exactly what the HTTP interaction is. You might be missing important headers.

--
<http://dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

Comment on Re: Retrieving web pages with the LWP::UserAgent

Replies are listed 'Best First'.
Re^2: Retrieving web pages with the LWP::UserAgent by bart (Canon) on Sep 07, 2006 at 09:28 UTC
Your second idea, about the session ID, was one worth pursuing. So I tried the URL manually, and I got a search page. I tried removing the "session ID" and I got a page with just 2 links: to a plain search page, and to an advanced search page. Apparently it's the latter the OP has been using, and its canonical URL is http://www.stat-usa.gov/nct_all.nsf/advSearch. And when I looked in this page's source, the form's action attribute was `/nct_all.nsf/2d58b7a34bbaa3838525703f004f804e?CreateDocument`: the exact same strange weird ID. So no, apparently it's not variable, but likely, generated by their web site creation tool. Do note the part after the question mark: "`CreateDocument`". I propose the OP tries it using POST with this part appended — and obviously, this wouldn't work with GET. I did try the OP's code as posted at this time, with just this changed (and the broken up words reassembled), and it works for me.	[reply] [d/l] [select]
Re^2: Retrieving web pages with the LWP::UserAgent by mrguy123 (Hermit) on Sep 06, 2006 at 13:30 UTC
Sorry about the 'GET', it should have been 'POST', althought the result is the same. I tried it with newer session ids, and got the same result. I will use your advice for the HTTP headers. Do you know if there is another way that a website stores info besides cookies and session IDs?	[reply]