in reply to Eliminating "duplicate" domains from a hash/array
As any given url could be a cgi or an html that uses server-side includes, there is no way to guarentee that even fetching the same url twice within any given timeframe will result in identical return.
Any mechanism for determining whether the results of different urls is the same will have to rely on fetching them and comparing the results. This might lead to some optimisazion in storage by having the 2 urls point to the same data, but pre-determining is just not possible.
Even storing the data offline is fraught with problems in that there is no guarentee that the content of an entirely static page will not be updated 1 day/hour/minute/second/microsecond after you captured and stored it.
|
---|