I usually process stuff like that out with HTML::Tidy. See also options --bare and --clean. Once you have sane HTML, further processing gets much easier.
Update: Word HTML to TWiki converter may also be of interest.
HTH,
planetscapeIn reply to Re: Regex bafflement
by planetscape
in thread Regex bafflement
by rastoboy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |