in reply to Re^2: web_diff.pl
in thread web_diff.pl

Linebreaks and whitespace are no structural elements in HTML and thus cannot be used to divide text into reasonably small yet big enough chunks to get a meaningful diff from two versions of a document.

Hence the idea to use punctuation as the structural element inherent to the text itself, to break if up into units that can be compared.

My approach would seem to s/[\s\n]+/ /gs and s/([\.,:;\!\?])\s/$1\n/gs and diff the resulting lines.

cheers,
--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}