stuaxo has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to match special comments from an HTML page, retrieved from $response->content, so I do;
if ($response->content =~/<!-- \[error\] -->(.+)<!-- \[\/error\] -->/m +) { print "$1\n"; }
Which when fed an appropriate HTML file it returns the text at the end of this post... what I really want is just to get all the stuff between each individual <!-- [error] -->stuff<!-- [/error] --> instead of from the first <!-- [error] --> to the last <!-- [/error] -->
mssql_execute(): message: The conversion of char data type to smalldat +etime data type resulted in an out-of-range smalldatetime value. (sev +erity 16)<!-- [/error] --><br /> <!-- [error] -->mssql_execute(): message: An error ocurred while conve +rting input datatypes. (severity 17)<!-- [/error] --><br /> <!-- [error] -->BeginSearch failed - unable to obtain search id.<!-- [ +/error] --><br /> <!-- [error] -->The template section specified does not exist. failed to open http://localhost/ retrys:0... waiting for 1 seconds... mssql_execute(): message: The conversion of char data type to smalldat +etime data type resulted in an out-of-range smalldatetime value. (sev +erity 16)

Replies are listed 'Best First'.
Re: Matching HTML Comments
by allolex (Curate) on Mar 02, 2004 at 13:18 UTC

    Because regular expression are prone not to work properly if there is any unexpected data, a lot of the monks (like tinita) will recommend you use something like Ovid's HTML::TokeParser::Simple to do this sort of thing. There is an example specific to dealing with comments in the documentation for the module.

    I hope that helps.

    --
    Allolex

Re: Matching HTML Comments
by tinita (Parson) on Mar 02, 2004 at 13:04 UTC
    besides making the regex less greedy as shown by borisz you might want to take a look at HTML::Parser or the HTML::Tree package on CPAN.
Re: Matching HTML Comments
by matija (Priest) on Mar 02, 2004 at 13:03 UTC
    This is because the .+ operator is greedy: it matches the longest strings it possibly can.( This is generaly a good a thing).

    A quick fix would be to change .+ to .+?.

    You can read more about this issue in perldoc perlre.

Re: Matching HTML Comments
by borisz (Canon) on Mar 02, 2004 at 13:00 UTC
    perhaps your regex is to greedy. Try this:
    if ($response->content =~/<!-- \[error\] -->(.+?)<!-- \[\/error\] -->/ +m +) { print "$1\n"; }
    UpdateCorion notes, that my? was outside the parentheses.
    Boris