in reply to Quantitative Change instead of Boolean

Sounds to me like you're thinking about something like a "web diff". The problem here is, the rules are not well established as for, say, diff and patch, and telling big from minor changes is not easy either. E.g. you could move block level elements around in the source and see no difference in rendering, the rules for that being in a css.

If you are interested in text only, it's easy I think. Text in normal files breaks at line ends, which is not the case in HTML. I'd suggest stripping the text from the HTML (with e.g. Tom Christiansens striphtml) removing empty lines and leading/trailing whitespace, jam it together and break it again at punctuation. The text between punctuations are your lines then, which you could run through diff.

If you are interested in markup changes/layout, you could compile the page into a DOM tree (e.g. with HTML::Tree), and compare it's content starting from the twigs.

Just some ideas...

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
  • Comment on Re: Quantitative Change instead of Boolean

Replies are listed 'Best First'.
Re^2: Quantitative Change instead of Boolean
by titivillus (Scribe) on Jun 30, 2006 at 14:33 UTC
    Sounds to me like you're thinking about something like a "web diff".
    Not really. Or maybe "yes, but...". A web diff, fully speaking, should point out the changes. I'm hoping to quantify the difference between a small change, like "18 comments" instead of "3 comments", and a big change, like ... new blog posts, (Yes, RSS. I do that too.) or anything else that's more significant than a spelling change or a raised digit. But thanks for the help.

    .sig goes here