in reply to Re: encode HTML entities with a regexp
in thread encode HTML entities with a regexp
I'm wondering about what might be inside <a>..</a>, and whether they should be balanced. In case not, you could simply go for:use strict; my $input=' This is sample <This is sample text> This is sample <Th<a> </a>is is sa<not a>mple text <a>this is samp +le text</a> sample text >'; my @arr=$input=~m#<[^<>]+(?:<a>.+?</a>[^<>]*)?>#g; print join "\n",@arr;
qr {<(?:[^<>]+|<a>|</a>)*>}
Otherwise, you might try:qr {<(?:[^<>]+|<a>|</a>)*[^>]*>}
my $re; $re=qr{(?:[^<>]+|<a>(??{$re})?</a>)*}; my @arr=$input=~m#<$re>#g;
|
|---|