in reply to Re: spaces removed in backreference
in thread spaces removed in backreference
A better solution might be ((?:<[^>]*>)+)
(?:<[^>]*>)+ or even just plain (?:<[^>]*>) allows a pair of tags with no intervening blanks to match or to be excluded from matching, respectively, thereby potentially destroying open-tag/close-tag synchronization: in either case, the "pair" of tags between which blanks are eliminated is incorrect. This can be remedied by changing the quantifier on $blank, but, as another reply has suggested, a proper HTML parser is really the best approach.
>perl -wMstrict -le "my $s = '<foo a=b></foo> <bar c=d> </bar> '; my $t; ($t = $s) =~ s@((?:<[^>]*>)+) +((?:<[^>]*>)+)@$1$2@g; print qq{'$t'}; my $tag = qr{ < [^>]* > }xms; my $blank = qr{ [ ] }xms; ($t = $s) =~ s{ ($tag) $blank+ ($tag) }{$1$2}xmsg; print qq{'$t'}; ($t = $s) =~ s{ ($tag) $blank* ($tag) }{$1$2}xmsg; print qq{'$t'}; print qq{'$s'}; " '<foo a=b></foo><bar c=d> </bar> ' '<foo a=b></foo><bar c=d> </bar> ' '<foo a=b></foo> <bar c=d></bar> ' '<foo a=b></foo> <bar c=d> </bar> '
Update: Added another example using * quantifier.
|
|---|