If you just want to de-HTMLify a document, the fastest way I know of doing it would be to run it through lynx -dump. This even gives you a bit of formatting.
If you really need to overwrite tags with spaces, and in the proper amount, then your approach of making a pattern first and then using it is not bad, but you're making two mistakes. First, you're only making a string, not a compiled regexp. You can very easily fix that by changing your first statement to:
my $pattern = qr/ ...whatever was here before... /;
Secondly, you are doing the work twice: first you just match for tags, then you substitute. Don't do that.
1 while $target_data =~ s/$pattern/' ' x length $1/ge;
(This is not tested! At all!)
Finally, don't use regexps to parse HTML. Use an HTML::Parser.
In reply to Re: Stripping HTML tags efficiently
by gaal
in thread Stripping HTML tags efficiently
by agynr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |