in reply to Re: Check website has update file using www::mechanize
in thread Check website has update file using www::mechanize

Is this Possible with Http::Response ???

If yes, then how can get the last modified information from http response ???

  • Comment on Re^2: Check website has update file using www::mechanize

Replies are listed 'Best First'.
Re^3: Check website has update file using www::mechanize
by Corion (Patriarch) on May 25, 2016 at 06:57 UTC

    HTTP has provisions for not sending data if it is younger than a given timestamp. See the ->mirror method of LWP::UserAgent and/or the If-Modified-Since header of HTTP.

      Which is by no means a guarantee that the data did or did not change. I deal with government data all the time, and their sites just list the ZIP/Excel/CSV/PDF files. You actually have to fetch the files in order to check if they changed (or their content changed).

      My approach is

      • Read persistent file with ZIP/CSV file checksums
      • Read site and parse links
      • For each link with a file I want/know
        • Fetch file into memory
        • Calculate SHA256
        • Compare to previous SHA256
        • same and next
        • save file
        • store SHA256
        • log/mail/other action(s)

      Enjoy, Have FUN! H.Merijn

      Yeah it's working but I still have a same problem

      my $res=$mech>mirror('download_link'); print " response is :",$res,"\n\n"; # no content

      I got download file when i ran but I need whether the file is updated or not, if updated then download otherwise just drop a message

        I hope that this is just a cut-n-paste error, as that is now what you mean. Missing a dash:

        my $res = $mech->mirror ("download_link"); # ^ there print " response is :", $res, "\n\n"; # no content

        If you are using use strict; and use warnings; running the code would show you.

        As I showed in my action list in this thread, the fact the a page that contains the links is not updated does not mean that the files it links to are not updated.

        You should post more of the real code for us to check if you are checking the right headers.


        Enjoy, Have FUN! H.Merijn