in reply to Quantitative Change instead of Boolean
If you are interested in text only, it's easy I think. Text in normal files breaks at line ends, which is not the case in HTML. I'd suggest stripping the text from the HTML (with e.g. Tom Christiansens striphtml) removing empty lines and leading/trailing whitespace, jam it together and break it again at punctuation. The text between punctuations are your lines then, which you could run through diff.
If you are interested in markup changes/layout, you could compile the page into a DOM tree (e.g. with HTML::Tree), and compare it's content starting from the twigs.
Just some ideas...
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Quantitative Change instead of Boolean
by titivillus (Scribe) on Jun 30, 2006 at 14:33 UTC |