in reply to Re^2: Check website has update file using www::mechanize
in thread Check website has update file using www::mechanize

HTTP has provisions for not sending data if it is younger than a given timestamp. See the ->mirror method of LWP::UserAgent and/or the If-Modified-Since header of HTTP.

Replies are listed 'Best First'.
Re^4: Check website has update file using www::mechanize
by Tux (Canon) on May 25, 2016 at 07:26 UTC

    Which is by no means a guarantee that the data did or did not change. I deal with government data all the time, and their sites just list the ZIP/Excel/CSV/PDF files. You actually have to fetch the files in order to check if they changed (or their content changed).

    My approach is

    • Read persistent file with ZIP/CSV file checksums
    • Read site and parse links
    • For each link with a file I want/know
      • Fetch file into memory
      • Calculate SHA256
      • Compare to previous SHA256
      • same and next
      • save file
      • store SHA256
      • log/mail/other action(s)

    Enjoy, Have FUN! H.Merijn
Re^4: Check website has update file using www::mechanize
by perlmad (Sexton) on May 25, 2016 at 10:32 UTC

    Yeah it's working but I still have a same problem

    my $res=$mech>mirror('download_link'); print " response is :",$res,"\n\n"; # no content

    I got download file when i ran but I need whether the file is updated or not, if updated then download otherwise just drop a message

      I hope that this is just a cut-n-paste error, as that is now what you mean. Missing a dash:

      my $res = $mech->mirror ("download_link"); # ^ there print " response is :", $res, "\n\n"; # no content

      If you are using use strict; and use warnings; running the code would show you.

      As I showed in my action list in this thread, the fact the a page that contains the links is not updated does not mean that the files it links to are not updated.

      You should post more of the real code for us to check if you are checking the right headers.


      Enjoy, Have FUN! H.Merijn

        Yeah Your are correct I pass one link to the mirror as a argument it's actually downloaded and again i pass a second link it's return a hash reference that's "response :HTTP::Response=HASH(0x8b555e4)"

        my $res=$mech->mirror('firstlink','firstfile.zip'); print "response :",$res,"\n\n"; $res=$mech->mirror('secondlink','secondfile.zip'); print "response :",$res,"\n\n";

        How can I change the hash reference to string

        I tried roughly is correct???

        print : response : ",%{$res},"\n\n"; # is correct???