Matching HTML Comments

stuaxo has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to match special comments from an HTML page, retrieved from $response->content, so I do;

if ($response->content =~/<!-- \[error\] -->(.+)<!-- \[\/error\] -->/m
+)
{
    print "$1\n";
}
[download]

Which when fed an appropriate HTML file it returns the text at the end of this post... what I really want is just to get all the stuff between each individual stuff instead of from the first  to the last 

mssql_execute(): message: The conversion of char data type to smalldat
+etime data type resulted in an out-of-range smalldatetime value. (sev
+erity 16)<!-- [/error] --><br />
<!-- [error] -->mssql_execute(): message: An error ocurred while conve
+rting input datatypes. (severity 17)<!-- [/error] --><br />
<!-- [error] -->BeginSearch failed - unable to obtain search id.<!-- [
+/error] --><br />
<!-- [error] -->The template section specified does not exist.

failed to open http://localhost/  retrys:0... waiting for 1 seconds...
mssql_execute(): message: The conversion of char data type to smalldat
+etime data type resulted in an out-of-range smalldatetime value. (sev
+erity 16)
[download]

Comment on Matching HTML Comments Select or Download Code

Replies are listed 'Best First'.
Re: Matching HTML Comments by allolex (Curate) on Mar 02, 2004 at 13:18 UTC
Because regular expression are prone not to work properly if there is any unexpected data, a lot of the monks (like tinita) will recommend you use something like Ovid's HTML::TokeParser::Simple to do this sort of thing. There is an example specific to dealing with comments in the documentation for the module. I hope that helps. -- Allolex	[reply]
Re: Matching HTML Comments by tinita (Parson) on Mar 02, 2004 at 13:04 UTC
besides making the regex less greedy as shown by borisz you might want to take a look at HTML::Parser or the HTML::Tree package on CPAN.	[reply]
Re: Matching HTML Comments by matija (Priest) on Mar 02, 2004 at 13:03 UTC
This is because the `.+` operator is greedy: it matches the longest strings it possibly can.( This is generaly a good a thing). A quick fix would be to change `.+` to `.+?`. You can read more about this issue in `perldoc perlre`.	[reply] [d/l] [select]
Re: Matching HTML Comments by borisz (Canon) on Mar 02, 2004 at 13:00 UTC
perhaps your regex is to greedy. Try this: `if ($response->content =~/<!-- \[error\] -->(.+?)<!-- \[\/error\] -->/ +m +) { print "$1\n"; }` [download] UpdateCorion notes, that my? was outside the parentheses. Boris	[reply] [d/l]