Those tags are strange in html because if they are started within a block-level html tag, they stop working if other block-level elements are inside.

I wonder if this might be a function of the particular browser you're using to display the content. I tried an example like the following in a Mozilla-Firebird browser just now (on macos X), and the "ins" and "del" segments ran their full extent, spanning and including the blockquote contents:

<html> <P> Starting here <ins> this is inserted text <blockquote> This is a block quote that is part of the inserted text. </blockquote> this is the end of the inserted text. </ins> the original paragraph goes into details which get deleted. <del> like this stuff here <blockquote> and the stuff in this blockquote is also deleted </blockquote> and this stuff too. </del> That leaves this part in. </html>
Now, I could imagine cases where the extent of an "ins" or "del" block might run afoul of the markup hierarchy -- e.g. if such a region started outside a blockquote and ended inside it:
<P> some text <del> some deleted text <!-- need to add end-del here <blockquote> <!-- need to add start-del here deleted partion of quote </del> retained portion of quote </blockquote retained portion of paragraph. </p>
Cases like this can be handled pretty well with HTML::TokeParser, given that you know what you need to look for. You could step through the data one "token" at a time, determine what sort of token it is (open tag, close tag, text content, comment) and throw in the extra tags where they're needed.

(This time, I'm not posting code because I think others, like jeffa could do it better and quicker than I could, and because it's late and I should go to bed. But now it is an interesting problem, and I'd be curious to know what you're actually doing to get from the RCS diff data to the placement of "ins" and "del" tags...)


In reply to Re: Re: Re: Regex: How do I use lookahead with search/replace? by graff
in thread Regex: How do I use lookahead with search/replace? by tunesmith

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.