Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

Hi RegExperts...

Sometimes I have to work with Java... Sometimes I simply hate it! This is one of those occasions...

I have a very simple regular expression I use very often in perl. It matches XML comments_ <!--([^-]|-[^-])*-->

Now guess what happens when you feed this to Java!

A Stack Overflow!

Can anyone of you think of an alternative RegEx to match an XML comment?


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re: Regex Problem - alternative searched
by xicheng (Sexton) on May 18, 2007 at 15:30 UTC
    Hi, I looked at the posts in the link you provided. one of the scenarios looks like exponential explosion when using alternations improperly (metioned in OReilly's book "Mastering Regular Exprssions" by J. Friedl). One of the solutions might be "unfolding" the alternations by using the following pattern:
    <!--[^-]*(?:-[^-]+)*-->
    which is supposed to be faster than your previous one.

    Regards,
    Xicheng

      That's it! Great! Thanks! It works!

      I guess you mentioned a book I should at least a look at.


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: Regex Problem - alternative searched
by BrowserUk (Patriarch) on May 18, 2007 at 11:08 UTC

      Yes. Non-greedy is possible.

      But rejecting a hit isn't an option in my usecase.

      I wanted to use the pattern in jEdit to search and destroy all comments in an XML file I'm editing.

      To be honest, a simple should help in my case, but as I am a purist in some cases, I like having a pattern that matches real comments and not everything that looks similar to a comment ;-)

      So the alternative I'm searching can't rely on additional checks.


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        ... but as I am a purist in some cases, ...

        Sometimes you have to sacrifice purity for practicality, especially in order to circumvent a bug/limitation in other peoples code. I tried to think of an alternative, to your regex, but anything I thought might work was inevitably more complex and therefore likely to exasperate the bug/limitation that is preventing you from using the 'right' solution.

        Can Java regex can handle non-capturing grouping and negative lookbehinds?

        If so, m[<!--(..(?:(?<!--).)+)-->] might bypass the bug?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.