Some background - I am developing a scraper and it needs to know if it has scraped the page already - hence the normalization. It needn't be perfect, just good enough for all but the most arcane. Needs only work with http.
What about using the "last_modified" method in LWP? Keep track of it locally. When you access the page again, check the time it was modified and skip it if that time is not newer than what you've saved.