Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

my $html = qq|
<!-- HTML comment#1 
Some comment
-->
Some text
<!-- HTML comment#2
Some comment
-->
|;

I want to replace all the html comments enclosed in comment tags mentioned above in $html with NULL value, there could be newlines as well in the variable. Can somebody tell me the regex ? I tried below but could not get it working:-
$html =~ s/<!--.*-->//mg;

Replies are listed 'Best First'.
Re: removing html comments from the page source
by Herkum (Parson) on Feb 26, 2007 at 12:51 UTC
    HTML::Clean will do what you are asking for and it will probably be easier than writing your own code.
Re: removing html comments from the page source
by varian (Chaplain) on Feb 26, 2007 at 10:58 UTC
    You need the s modifier so that the dot matches a newline too. This regex will do it for you:
    #!/usr/bin/perl -w my $html = qq| <!-- HTML comment#1 Some comment --> Some text <!-- HTML comment#2 Some comment --> |; my $comment='<!--.*?-->'; $html=~ s/$comment//sg; print "=$html=\n";
Re: removing html comments from the page source
by wfsp (Abbot) on Feb 26, 2007 at 13:44 UTC
    In addition to Herkum's suggestion there is an example script in HTML::TokeParser::Simple called "Stripping comments" which may be worth having a look at.
Re: removing html comments from the page source
by hangon (Deacon) on Feb 26, 2007 at 17:52 UTC

    While the above comments should solve your problem, for your further education you might want to read up on the concept of *greed* in regular expressions.