jthornton has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am having a horrible problem... I have a script that pulls down data from a web site containing product defect data every 10 min. The script is using LWP::UserAgent and HTTP::Request. The URL I am passing is targeting a report from the web app, basically passing in a new date range on in the URL each time. Here is the issue... I'm getting HORRIBLE cache problems. The http response is returning reports that are old or have nothing to do with the request I just sent. But, I can take that exact URL and put it into my IE browser on my windows machine and the web app returns the correct data every time. So it's something to do with my script or how I am sending the http header that it thinks it's ok to grab an old record. So, what are the commands/steps I need to take to ensure that the client request says to not take a cache copy of anything? Anyone know... or any hints? I've tried messing with the proxy settings, using a proxy or not I get the same result. Thanks in advance, jthornton
sub get_response { my $url = $_[0]; my $ua = LWP::UserAgent->new(agent => 'Mozilla/4.0 (compatible; MS +IE 6.0; Win32)'); $ua->cookie_jar(HTTP::Cookies->new); my $request = HTTP::Request->new( GET => $url ); my $response=$ua->request($request); while ($response->code eq "301" or $response->code eq "302") { $url=$response->header('Location'); $request=new HTTP::Request("GET", $url); $response=$ua->request($request); } my $response_string = $response->as_string; return $response_string; }

Replies are listed 'Best First'.
Re: HTTP::Request and caching...
by starbolin (Hermit) on Apr 21, 2005 at 00:04 UTC

    You need to form a request header. There are several ways. Try:

    my $request = new HTTP::Request GET, $url, "Pragma: no-cache";


    added commas.

    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
Re: HTTP::Request and caching...
by jthornton (Initiate) on Apr 21, 2005 at 00:03 UTC
    Another question related to this... does HTTP::Request cache any information locally like a browser might?

      HTTP::Request creates the request structure it does not fetch web pages. That function is provided by LWP::UserAgent. In your code you call the constructor for these two classes inside you subroutine, so the instance of the class is local to the subroutine and goes out of scope when you exit the subroutine. This should take care of any data persistance problem. So your answer is no.

      Note that memory for the instance is not released until the program exits so that memory is reserved everytime your subroutine runs but is not released. This is a memory leak and could be causing problems depending on how many times your subroutine is called.

      Note: I corrected a syntax error in my other post.

      s/net/not/


      s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
        Hmmmm... yeah, this problem is driving me crazy. Thanks for the responses so far. I've tried sending the client header request with a no cache setting... I've also tried changing the document age to something very high so it will always be considered "not fresh". Something out there is caching this information and giving bad data... what's strange is the URL I'm passing is different each time (dynamically generated by filling in parameters passed in from the URL). If I take the same exact URL and drop it into my browser I get the correct http response every single time. But, using LWP::UserAgent it's bad bad bad. What is strange too is every time I make a request the responding app creates a cookie with a session ID... I can see the cookie and it shows up and it's different everytime... so I assume the request is actually making it to the server. I see this problem with a proxy server and without.... BUT, if I move from one proxy to another every few hours I'll get one good response... then all repsonses after that are old and incorrect. The cache munchkins are toying with me! Arg! Btw, in this particular situation the subroutine is only being called once... then the program exits. But, how would I get rid of this leak? Thanks...
        Hmm... well, making forward progress... kind of. The http header response I am getting back has this info available via $response->fresh_until etc:
        Fresh Until: 1/16/2008 16:16:15 Current Age: 0 Freshness Lifetime: 9/26/1972 17:00:00 Is Fresh: 1 I've added this to the request: $request->header("Pragma" => "no-cache"); $request->header("Cache-control" => "no-cache"); I even tried: $request->header("Cache-control" => "max-age=-1");
        No luck... any http guru's out there? How can I control what the age is and ensure that my request rejects this doc that for some reason has a riculously large lifretime? I'm assuming the issues is whith my http header since my browser sends back the correct data every time... hmmmpf.
Re: HTTP::Request and caching...
by starbolin (Hermit) on Apr 24, 2005 at 23:17 UTC

    I fixed up the object allocation and ran this on a timeserver that updates once per second. Had no problems.

    my $ua = LWP::UserAgent->new( agent => 'Mozilla/4.0 (compatible; MSIE 6.0; Win32)'); while () { my $string=get_response( $url, $ua); $string=~/\d+:\d+:\d+/; print $&,"\n"; }; sub get_response { my ( $url, $ua ) = @_; my $request = HTTP::Request->new( GET => $url ); my $response=$ua->request($request); return $response->as_string; };

    Note that there is no need to explicitly process 301 redirects. LWP::UserAgent handles these automaticly. From LWP::UserAgent
    "The request() method will process redirects and authentication responses transparently."

    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}