Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to be able to-say, go through a list of bookmarks in an html file, have lwp get each one and save it. then when i run it the next day it should do the same thing, but compare it to the saved version, detect any changed text, and display only the changed portions. I know there's some site-updated html tag, but that isn't used on all sites. thanks

Replies are listed 'Best First'.
Re: detecting site changes
by rinceWind (Monsignor) on Sep 06, 2004 at 15:29 UTC
    Dear Anonymous Monk,

    What exactly is your question? Is it whether this kind of site diffing has been done before? Or is it whether this kind of thing is doable? (it clearly is in Perl) Or is it a question of needing help on how best to do it?

    Usually, you will get helpful feedback if you post code, to show what you have tried so far. Alternatively, post pseudocode showing your intended design. This is not a site where you can go and expect people to write substantial code for you for free.

    However, if you just need some tips on which modules to look into, look at the following:

    Hope this helps

    --
    I'm Not Just Another Perl Hacker

Re: detecting site changes
by ranjan_jajodia (Monk) on Sep 06, 2004 at 15:36 UTC
    What is your question? I would think it is a pretty straight forward thing. You need HTML::Parse module. Go through lwp's documentation. It is pretty solid.
    #save $var->content as html file (to match with next day's file) in th +e normal manner (not shown here) $var=$user_agent->get('url') ; $parsed_html=parse_html($var->content); $txt_on_the_page=$parsed_html->format; #for the earlier saved html $old_parsed_html=parse_html($oldfile->content); $txt_on_the_old_page=$old_parsed_html->format; # now compare $txt_on_the_page & $txt_on_the_old_page. I have put very + less code with very optimistic approach. But with little effort you +will get what you need.
Re: detecting site changes
by zby (Vicar) on Sep 06, 2004 at 16:24 UTC
    You have it in your browser - most of the browsers now in use can notify when a bookmarked page has changed.

    The problem with this approache is that most of dynamic pages have in one place or another the date or/and current time so every time you download them they are automatically changed. To cope with that problem you need a mechanism for recognizing such trivial changes and that is not trivial - but I am currently writing a web application for that exact task: Active Bookmarks. You can download the code, it is GPLed.

    Additionally at Yet another HTML diff just a few days ago I posted the code of the heart of the application.