APerlJax has asked for the wisdom of the Perl Monks concerning the following question:

I've searched far and wide for a script/utility that performs a task I know many people have needed to do before.

I need something that can take two chunks of text (including newlines), diff them, and mark the changes. The GNU diff utility just won't cut it.

For example:

Block 1
----------

The quick brown
fox jumped over the
lazy dog while running
through my house.

Block 2
----------

The fast brown
fox jumped over the fat,
lazy dog while running.

Result
----------

The <<fast>><<Deleted: quick>> brown
fox jumped over the <<fat,>>
lazy dog while running<<.>>
<<Deleted: through my house.>>

Something to that effect. Does anyone know of something with the capibility (or customizably close) that I can tie into a Perl script?

Thanks!
~Roger

Replies are listed 'Best First'.
Re: Text (Version) Differencing
by bart (Canon) on Apr 18, 2006 at 19:16 UTC
    You appear to be looking for a word-by-word diff utility. Googling for it gives me few results, there appears to have been at one time such a utlility in Perl, but the domain name has dropped off the earth.

    It appears to be quite popular for wikis, and a popular implementation seems to be the diff utility from Wikipedia, available in the MediaWiki package.

    GNU wdiff, as suggested in one of the pages I linked to above, seems to be using quite a crude methodology.

    Now if you want to try building an implementation using Algorithm::Diff, then I'd first try doing a crude diff using a line by line comparison, and once you pinpointed the differences in lines, split these lines into components (words, punctuation, optionally whitespace too) and diff again between the word lists.

Re: Text (Version) Differencing
by TedYoung (Deacon) on Apr 18, 2006 at 18:44 UTC

    I have used Algorithm::Diff with great success. Just iterate over the blocks it generates and you can mark up each block how you see fit.

    Ted Young

    ($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)
Re: Text (Version) Differencing
by Ido (Hermit) on Apr 18, 2006 at 18:44 UTC
Re: Text (Version) Differencing
by kvale (Monsignor) on Apr 18, 2006 at 18:47 UTC
    The above specification is ambiguous. The differencing seems to leave out spaces and newlines. Are you only differencing in words? What about punctuation?. To solve this problem, you must first figure out which differences are significant and which are not.

    -Mark

Re: Text (Version) Differencing
by traveler (Parson) on Apr 18, 2006 at 22:11 UTC
    If by "tie into a perl script" you mean easily initegrate, maybe not. kdiff3 seems to to the job, though. Perhaps you can use the code in your script.

    You could also wrap Algorithm::Diff with something like Text::Diff::HTML if you wanted.

Re: Text (Version) Differencing
by planetscape (Chancellor) on Apr 19, 2006 at 08:13 UTC