A better solution might be ((?:<[^>]*>)+)
(?:<[^>]*>)+ or even just plain (?:<[^>]*>) allows a pair of tags with no intervening blanks to match or to be excluded from matching, respectively, thereby potentially destroying open-tag/close-tag synchronization: in either case, the "pair" of tags between which blanks are eliminated is incorrect. This can be remedied by changing the quantifier on $blank, but, as another reply has suggested, a proper HTML parser is really the best approach.
>perl -wMstrict -le
"my $s = '<foo a=b></foo> <bar c=d> </bar> ';
my $t;
($t = $s) =~ s@((?:<[^>]*>)+) +((?:<[^>]*>)+)@$1$2@g;
print qq{'$t'};
my $tag = qr{ < [^>]* > }xms;
my $blank = qr{ [ ] }xms;
($t = $s) =~ s{ ($tag) $blank+ ($tag) }{$1$2}xmsg;
print qq{'$t'};
($t = $s) =~ s{ ($tag) $blank* ($tag) }{$1$2}xmsg;
print qq{'$t'};
print qq{'$s'};
"
'<foo a=b></foo><bar c=d> </bar> '
'<foo a=b></foo><bar c=d> </bar> '
'<foo a=b></foo> <bar c=d></bar> '
'<foo a=b></foo> <bar c=d> </bar> '
Update: Added another example using * quantifier.
|