There might occur a wrong information if a document you are fetching from a remote server will have an "old" content but proclaims to be just generated due to the fact that it was parsed and changed by the server or that it was newly generated due to the use of a content management system and for example a change in the layout that affected the document as well or even worse, the document does not exist at all, but is generated upon request from any datasource.

I'm not 100% clear on what you think the problem is. If you're trying to detect whether a remote server is presenting new content for a page, and are beging foiled by automatically generated timestamps in headers or footers (or elsewhere on the page), and you really, really need to know if content has changed, then I see two options for you.

First, write page-specific processing code that strips out the dynamic parts. Then, compute an MD5 hash on what's left. If that hash hasn't changed since the last time you looked, you don't have new content.

The other approach is to do use Algorithm::Diff to do a diff, then try to get smart (perhaps on a page by page basis) about what differences you really care about. For examaple, if the text fragments that differ look like dates or times, ignore them.


In reply to Re: Far OT (was Re: Changing and checking timestamps for) remote (files) by dws
in thread Changing and checking timestamps for files by LukeyBoy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.