in reply to Question for regex experts

How about this (a suggestion of my colleague)?
$_ = "test;a <>amp;kap;&da;ma; &amp alfa&romeo & mich;"; while ( s/([&][^ ]*);(?!( |&|$))/\1AMPSEMICOLON/ ) {}; s/AMPSEMICOLON/;\&/g; # Output: test;a &<>&&kap;&da;&ma; &amp alfa&romeo & mich;
As you see, this is not restricted to a particular list of html codes, and that is what I want!

Replies are listed 'Best First'.
Re^2: Question for regex experts
by AnomalousMonk (Archbishop) on Jun 27, 2013 at 13:41 UTC
    ... not restricted to a particular list of html codes ...

    I'm not sure that's really such a good idea (it should be easy to get a list of all HTML entities you could possibly be interested in), but here's a generalization:

    >perl -wMstrict -le "my $bad = 'x &foo;xx;yz;qwe; y &de;fghj;h; z &amp;&gt;&lt; y lt;gt; z'; print qq{'$bad'}; ;; my $tity = qr{ [[:alpha:]]+ ; }xms; ;; (my $fixed = $bad) =~ s{ (?: (?: \G (?<! \A)) | &) $tity \K (?= $tity) } '&'xmsg; print qq{'$fixed'}; " 'x &foo;xx;yz;qwe; y &de;fghj;h; z &amp;&gt;&lt; y lt;gt; z' 'x &foo;&xx;&yz;&qwe; y &de;&fghj;&h; z &amp;&gt;&lt; y lt;gt; z'