As part of a text-to-html project I'm on, I need to display changes in the text from one version to the next in bold. Happily, I'm storing the text in CVS, so getting a diff is quite simple.

My problem is that the client is very specific: the bolded sections need to be the word, phrase,sentence, or paragraph that is different, but not more. Omissions are unmarked (don't ask why). the CVS diff identifies changed lines between drafts, but I need to pull changed words out of them. (Note that 'lines' in this case are actually paragraphs)

My idea so far was to use Algorithm::Diff, which does element-by-element comparisons of two lists. I can split the lines into lists of words, and run that through it. My trouble now is figuring out how to translate that into bolding. This is not aided by the fact that at somep point I have to run the line through HTML::Entities::encode_entities(), which will move stuff around, and break any bolding put in by a regexp.

Algorithm::Diff will give me output like:

[ [ [ '-', 0, 'a' ] ], [ [ '+', 2, 'd' ] ], [ [ '-', 4, 'h' ] , [ '+', 4, 'f' ] ], [ [ '+', 6, 'k' ] ], [ [ '-', 8, 'n' ], [ '-', 9, 'p' ], [ '+', 9, 'r' ], [ '+', 10, 's' ], [ '+', 11, 't' ], ] ]
Except that in my case, the letters will be words. Can anyone think of a relatively elegant way to mark changed sections in <B></B> tags, while still working with encode_entities and not getting confused by punctuation?

In reply to Finding changed words by swiftone

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.