Re: Comment Removal

Considering that Regexp::Common defines a html comment match as being found via the following regex, it's probably best to stick with a module as such, so as to catch any nasty "unusual" cases of proper html comments.

# not useful by itself, see update below
(?k:(?k:<!)(?k:(?:--(?k:[^-]*(?:-[^-]+)*)--\s*)*)(?k:>))
[download]

update: I hadn't ever seen the (?k:) usage before, and thanks to benizi, I now see it is simply used internally withing Regexp::Common's collection of modules to allow for optional capturing of segments of the common regular expression being used. From the docs:

To specify such "optional" capturing parentheses within the regular expression associated with create, use the notation (?k:...). Any parentheses of this type will be converted to (...) when the -keep flag is specified, or (?:...) when it is not.

So unless I've stripped something out that I shouldn't of, the above example would really boil down to the following (non-capturing) regex, useful for stripping html comments:

/<!--[^-]*-[^-]+*--\s**>/
[download]

second update: Obviously this won't compile, as per fireartist below. My apologies, I was finishing that up with less than a minute of computer time left :)

Comment on Re: Comment Removal Select or Download Code

Replies are listed 'Best First'.
Re^2: Comment Removal by fireartist (Chaplain) on Oct 17, 2005 at 20:06 UTC
That doesn't compile, +* and ** aren't valid. The grouping parenthesis are still needed. `perl -MData::Dumper -MRegexp::Common -e 'print Dumper qr/$RE{comment}{ +HTML}/'; $VAR1 = qr/(?-xism:(?:(?:<!)(?:(?:--(?:[^-](?:-[^-]+))--\s))(?:>)) +)/;` [download]	[reply] [d/l]