You are doing this the hard way! Parsing an HTML document using regular expressions is almost certainly doomed to bugginess, an abundance of edge cases, and inconsistant results. Even simple documents can trip one up, but MsWord documents dumped to HTML hardly qualify as simple. Please consider using an HTML parser package instead. It will get the interpretation of tags nested within tags right.
Some modules to look at : HTML::Parser, and in general CPAN search for HTML parsing tools.
In reply to Re: changing format of the first word of every line in an HTML doc
by ELISHEVA
in thread changing format of the first word of every line in an HTML doc
by sharkyflip
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |