swiftone has asked for the wisdom of the Perl Monks concerning the following question:
My problem is that the client is very specific: the bolded sections need to be the word, phrase,sentence, or paragraph that is different, but not more. Omissions are unmarked (don't ask why). the CVS diff identifies changed lines between drafts, but I need to pull changed words out of them. (Note that 'lines' in this case are actually paragraphs)
My idea so far was to use Algorithm::Diff, which does element-by-element comparisons of two lists. I can split the lines into lists of words, and run that through it. My trouble now is figuring out how to translate that into bolding. This is not aided by the fact that at somep point I have to run the line through HTML::Entities::encode_entities(), which will move stuff around, and break any bolding put in by a regexp.
Algorithm::Diff will give me output like:
Except that in my case, the letters will be words. Can anyone think of a relatively elegant way to mark changed sections in <B></B> tags, while still working with encode_entities and not getting confused by punctuation?[ [ [ '-', 0, 'a' ] ], [ [ '+', 2, 'd' ] ], [ [ '-', 4, 'h' ] , [ '+', 4, 'f' ] ], [ [ '+', 6, 'k' ] ], [ [ '-', 8, 'n' ], [ '-', 9, 'p' ], [ '+', 9, 'r' ], [ '+', 10, 's' ], [ '+', 11, 't' ], ] ]
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Finding changed words
by tye (Sage) on Sep 16, 2000 at 00:54 UTC | |
Re: Finding changed words
by merlyn (Sage) on Sep 15, 2000 at 23:00 UTC | |
Re: Finding changed words
by tye (Sage) on Sep 15, 2000 at 22:49 UTC | |
Re: Finding changed words
by extremely (Priest) on Sep 16, 2000 at 01:35 UTC | |
by tye (Sage) on Sep 16, 2000 at 01:47 UTC |