in reply to Re: Removing html comments with regex
in thread Removing html comments with regex

Please forgive my newbie novice questions, but how is all of that easier and faster than a one liner? Isn't regex supposed to be powerful, and fast? I still am really curious why the regex won't work. I wuz just thinking: Maybe the modules aren't haunted, but somethings making the comments disappear.
  • Comment on Re: Re: Removing html comments with regex

Replies are listed 'Best First'.
Re: Re: Re: Removing html comments with regex
by chromatic (Archbishop) on Aug 23, 2003 at 05:44 UTC

    It's easier because it's already written, it works, and you don't have to know how it works to use it.

    The problems with regular expressions and HTML is that HTML isn't required to be regular. It's not hard to write a simple regex that processes simple HTML, but it's easy to write realistic HTML that a regex won't process.

      And the good news is, I'm just about finished with a brand new version of Regexp::Token. I need to write the docs and start creating weird edge cases. When done, it should allow you to safely remove comments.

      my $html_comment = Regexp::Token->create($some_comment_token); $html =~ s/$html_comment//g;

      Or do things like:

          $html =~ /some text($p_tag)more text/;

      I'm leaving for the beach tomorrow morning, but by Monday I hope to have this posted. Too bad it's ridiculously slow.

      Cheers,
      Ovid

      New address of my CGI Course.

Re: Re: Re: Removing html comments with regex
by allolex (Curate) on Aug 23, 2003 at 07:12 UTC

    In addition to chromatic's remarks, it's also a good idea to learn how to use the HTML parser modules because you will almost certainly run into an application for them later. Using them to delete comments is one thing, but if you ever want to do anything more complex with HTML, you'll already be familiar with this tool. I recommend Ovid's very intuitive HTML::TokeParser::Simple. (I was in your situation not too long ago.) :)

    --
    Allolex