in reply to Comment Removal

Considering that Regexp::Common defines a html comment match as being found via the following regex, it's probably best to stick with a module as such, so as to catch any nasty "unusual" cases of proper html comments.

# not useful by itself, see update below (?k:(?k:<!)(?k:(?:--(?k:[^-]*(?:-[^-]+)*)--\s*)*)(?k:>))

update: I hadn't ever seen the (?k:) usage before, and thanks to benizi, I now see it is simply used internally withing Regexp::Common's collection of modules to allow for optional capturing of segments of the common regular expression being used. From the docs:

To specify such "optional" capturing parentheses within the regular expression associated with create, use the notation (?k:...). Any parentheses of this type will be converted to (...) when the -keep flag is specified, or (?:...) when it is not.

So unless I've stripped something out that I shouldn't of, the above example would really boil down to the following (non-capturing) regex, useful for stripping html comments:

/<!--[^-]*-[^-]+*--\s**>/

second update: Obviously this won't compile, as per fireartist below. My apologies, I was finishing that up with less than a minute of computer time left :)

Replies are listed 'Best First'.
Re^2: Comment Removal
by fireartist (Chaplain) on Oct 17, 2005 at 20:06 UTC

    That doesn't compile, +* and ** aren't valid. The grouping parenthesis are still needed.

    perl -MData::Dumper -MRegexp::Common -e 'print Dumper qr/$RE{comment}{ +HTML}/'; $VAR1 = qr/(?-xism:(?:(?:<!)(?:(?:--(?:[^-]*(?:-[^-]+)*)--\s*)*)(?:>)) +)/;