in reply to Re: Re: Removing html comments with regex
in thread Removing html comments with regex

It's easier because it's already written, it works, and you don't have to know how it works to use it.

The problems with regular expressions and HTML is that HTML isn't required to be regular. It's not hard to write a simple regex that processes simple HTML, but it's easy to write realistic HTML that a regex won't process.

  • Comment on Re: Re: Re: Removing html comments with regex

Replies are listed 'Best First'.
Re: Re: Re: Re: Removing html comments with regex
by Ovid (Cardinal) on Aug 23, 2003 at 06:15 UTC

    And the good news is, I'm just about finished with a brand new version of Regexp::Token. I need to write the docs and start creating weird edge cases. When done, it should allow you to safely remove comments.

    my $html_comment = Regexp::Token->create($some_comment_token); $html =~ s/$html_comment//g;

    Or do things like:

        $html =~ /some text($p_tag)more text/;

    I'm leaving for the beach tomorrow morning, but by Monday I hope to have this posted. Too bad it's ridiculously slow.

    Cheers,
    Ovid

    New address of my CGI Course.