in reply to Re^2: Munging Rendered HTML While Preserving Formatting
in thread Munging Rendered HTML While Preserving Formatting

There are problems if the replacement text is longer, shorter, or the same size. If the text is longer, where do you put the extra? If the text is shorter, where do you remove the characters? If the text is the same length, do you break it in the same way?

This is really only a problem when doing replacements with sentences instead of words. It is pretty unlikely that a word will be split in non-pathological cases. It can be argued that a tag is equivalent to a word break. The problem is actually pretty similar to doing munging across line breaks.

The only sane is to do replacement on individual text blocks. It might be possible to do replacements on multiple words, either by using something like XSLT that works on the tree. The other way to do would write regexp that match whitespace and elements as word separators. For XML, this would not be too hard. The other hard part is maintaining the tags when doing the substitution.

  • Comment on Re^3: Munging Rendered HTML While Preserving Formatting