Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Is LWP::Simple's get function caching pages?

by CombatSquirrel (Hermit)
on Sep 17, 2004 at 22:33 UTC ( [id://391914]=perlquestion: print w/replies, xml ) Need Help??

CombatSquirrel has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,
I'm currently trying to load and parse a web page (weather information) every hour to get some overview over the temperature development over the day. It does seem though as if LWP::Simpe's get function were caching the page since it gives me the same page over and over again -- even though the content of the resource changed.
Here's a simplified version of the code:
#!perl use strict; use warnings; use LWP::Simple; while (1) { my $page = get 'http://de.weather.yahoo.com/GRXX/GRXX0024/index_c.h +tml'; $page or die "Failure: $!\n"; $page =~ /Heute(?:\s*<[^>]+>)*\s*([^<]+)/i or die "No match\n"; print $1, "\n"; sleep 60 * 60; }
Does anyone know what is going on? Is get actually caching the page or did I just screw up the script?
Thanks in advance,
CombatSquirrel.

Entropy is the tendency of everything going to hell.

Replies are listed 'Best First'.
Re: Is LWP::Simple's get function caching pages?
by Fletch (Bishop) on Sep 18, 2004 at 02:15 UTC

    Check and make sure you don't have http_proxy or the like set in your environment pointing at a caching proxy.

      At least outside of Perl, I don't use a proxy. I suppose you mean my OS' settings by "environment", but I'm sure I have no proxy set there. Is there a Perlish/LWP::Simpleish equivalent?
      Thanks for your response anyways.
      CombatSquirrel.

      Entropy is the tendency of everything going to hell.
Re: Is LWP::Simple's get function caching pages?
by diotalevi (Canon) on Sep 17, 2004 at 22:47 UTC
    LWP::Simple does not cache the page.
Re: Is LWP::Simple's get function caching pages?
by bart (Canon) on Sep 18, 2004 at 08:12 UTC
    No, LWP::Simple doesn't cache pages, but something else might. If there is no intermediate proxy, check the webserver itself. It'll probably will show the same problem in a browser.

    Is this a statical page (a file) that you change on the server? In such a case, Apache is known to cache pages for a while.

    A simple solution for such a problem is to replace the statical page on the server with a CGI script that returns the file contents, preceded with an appropriate content-type header and a blank line. It's either that, or finding a way to tell the server not to cache the files. I wouldn't know how, though.

      You're right about the browser behaving the same way, but reloading the page solved it. I would have supposed that to be a sign that the browser caches the page, not the server. I don't think that LWP::Simple inherits browser settings, though. That leaves me with my ISP possibly caching pages, although I can't figure why a reload would solve the problem in this case.
      And I definitely can't change the page itself (it's Yahoo's weather service - and yes, I checked: the terms of service don't seem to forbid the collection of weather data).
      Thanks for your help.
      CombatSquirrel.

      Entropy is the tendency of everything going to hell.
        No, force-reloading the page probably sent along extra headers to the webserver, telling it to reread the file.

        Now, if you can figure out what those headers are, you can duplicate that behaviour with LWP, though you're probably will have to use something slightly more powerful (= lower level) than LWP::Simple.

Re: Is LWP::Simple's get function caching pages?
by CombatSquirrel (Hermit) on Sep 18, 2004 at 10:58 UTC
    Just for the files: problem persistent (or rather: reappeared). When I hit "Reload" in my browser nothing happens; only when I reenter the URL. And when I reopen the browser it's stuck with the old version.
    In the script I tried
    $page = get 'http://...'; sleep 2; $page = get 'http://...';
    and it's working so far, but I don't know wheter it will continue to. If it doesn't, I'll post an update.
    CombatSquirrel.

    Entropy is the tendency of everything going to hell.
      Nope, that didn't help. The first time it worked and now it's back to the old version, even if I restart the script. Guess I'll give up.
      CombatSquirrel.

      Entropy is the tendency of everything going to hell.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://391914]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-25 14:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found