Re^2: How to get web creation date from webserver?

Replies are listed 'Best First'.
Re^3: How to get web creation date from webserver? by jhourcle (Prior) on Aug 23, 2005 at 10:25 UTC
Read the HTTP specification. Specifically, section 14.25, 'If-Modified-Since'. You return the 'Last-Modified' timestamp from when you cached the file (or the date you got it, but then you have to deal with generating the date format), and if the file hasn't been modified, and the webserver supports this header, it should return a '304' status message, rather than the full content all over again.	[reply]
Re^3: How to get web creation date from webserver? by holli (Abbot) on Aug 23, 2005 at 07:08 UTC
So you don't want to know when the page has been created, you want to know if the page has changed since you have last visited/downloaded it. I don't know of a readymade perl way to do this, but there are a lot of of programs, e.g. webmon. holli, /regexed monk/	[reply]
Re^4: How to get web creation date from webserver? by spatterson (Pilgrim) on Aug 24, 2005 at 15:35 UTC
You could fetch the page with LWP, calculate & store an MD5 checksum, then simply compare the current checksum with the last one. Code hastily snipped and sanitised :) use Digest::MD5 qw/md5_hex/; sub web_MD5 { # get MD5 sum of an url my $url = shift; my $ua = LWP::UserAgent->new(env_proxy => 1, keep_alive => 1, timeout => 30); my $response = $ua->get($url); unless ($response->is_success) { # failed to fetch print "Error fetching ", $url, " ", $response->status_line; } warn "Error while getting ", $response->request->uri, " -- ", $response->status_line, "\nAborting"; unless $response->is_success; my $doc = $response->content(); my $md5 = md5_hex($doc); undef $ua; return $md5; } [download]	[reply] [d/l]
Re^5: How to get web creation date from webserver? by holli (Abbot) on Aug 24, 2005 at 15:54 UTC
A few suggestions: Once you have fetched the page you could simply and blindly write it over the old version to the disk. That would be cheaper than calculating a checksum, esp. because you have to download the page twice (your function does not return the fetched data). So i would at least alter your code. `return ($md5, \$response);` [download] Also, you do not return (but warn) when when the fetch fails. I didn't check what happens when you `md5_hex`ing an undef value and what the code returns. It should return undef to indicate the failure to the caller. So, even if your code works, it would be simply clearer when you explicitly return undef when the fetch fails. `unless ($response->is_success) { warn "Error while getting ", $response->request->uri, " -- ", $response->status_line, "\nAborting"; return; }` [download] holli, /regexed monk/	[reply] [d/l] [select]
Re^4: How to get web creation date from webserver? by gube (Parson) on Aug 23, 2005 at 07:51 UTC
Dear holli, Is it possible that any tool for linux. And also, i want to check automatically and based on that i want to run perl file to download. If based on webmon means i have to check daily anyother way to get information automatically the changes made in the page information Thanks.	[reply]
Re^5: How to get web creation date from webserver? by davidrw (Prior) on Aug 23, 2005 at 12:11 UTC
The commandline tool `wget` supports mirroring of sites (or pages) and has options for getting newer files or not (see man page). As for the "check daily" part, sounds like this should be crontab'd.	[reply] [d/l]