in reply to Regex Problem - alternative searched

Does Java do non-greedy matching? If so, you could try /<!--(.+?)--(.)/ and reject the match unless $2 is '>'. That might avoid the stack problem.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Regex Problem - alternative searched
by Skeeve (Parson) on May 18, 2007 at 13:05 UTC

    Yes. Non-greedy is possible.

    But rejecting a hit isn't an option in my usecase.

    I wanted to use the pattern in jEdit to search and destroy all comments in an XML file I'm editing.

    To be honest, a simple should help in my case, but as I am a purist in some cases, I like having a pattern that matches real comments and not everything that looks similar to a comment ;-)

    So the alternative I'm searching can't rely on additional checks.


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      ... but as I am a purist in some cases, ...

      Sometimes you have to sacrifice purity for practicality, especially in order to circumvent a bug/limitation in other peoples code. I tried to think of an alternative, to your regex, but anything I thought might work was inevitably more complex and therefore likely to exasperate the bug/limitation that is preventing you from using the 'right' solution.

      Can Java regex can handle non-capturing grouping and negative lookbehinds?

      If so, m[<!--(..(?:(?<!--).)+)-->] might bypass the bug?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        THANKS! A slight modification seems to work: (?s)<!--(.(?<!--))*-->

        At first it appears a bit odd because of the negative lookbehind which looks like an XML Comment start ;-)


        s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
        +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e