in reply to Refactoring Regular Expressions

Logie17, the module HTML::Entities deals with html-encoded text within entities.
#!/usr/bin/perl use strict; use warnings; use HTML::Entities qw(decode_entities); my $htmlstring = 'This text contains an encoded "<" tag'; print decode_entities($htmlstring),"\n";
This will output:
This text contains an encoded "<" tag
Update: Your remark on encoded html text may have set me off in the wrong direction, apparently you just wanted to remove any tags.
It struck me that if you split up the string into substrings you may accidently split in the mid of an encoded tag, and as a result your regex would fail to match the tag.