in reply to Re^5: counting the number of 16384 pattern matches in a large DNA sequence
in thread counting the number of 16384 pattern matches in a large DNA sequence

Funny, I thought I was getting back inside the box by using a simple regex. :-)

I've never been totally comfortable with the zero-width assertions, because logically it seems to me that that should keep matching in the same spot. If zero-width really means zero-width, the next match should start at the same point and match the same string again, shouldn't it? I guess the semantics of it just bug me. I feel like I should have to use something like this to make sure it advances a character:

m[([ACGT](?=[ACGT]{6}))];

Anyway, I'm impressed that the regex does as well as it does, considering how simple it was to implement.

Aaron B.
Available for small or large Perl jobs; see my home node.

  • Comment on Re^6: counting the number of 16384 pattern matches in a large DNA sequence
  • Download Code

Replies are listed 'Best First'.
Re^7: counting the number of 16384 pattern matches in a large DNA sequence
by BrowserUk (Patriarch) on Jun 15, 2012 at 16:37 UTC
    If zero-width really means zero-width, the next match should start at the same point and match the same string again, shouldn't it?

    Having the regex engine always move on by at least 1 character is a way to ensure that it always makes progress. Thus it prevents many cases of endless looping that would result in pathological behaviors.

    Once you know that, the semantics are very useful.

    Conversely this m[([ACGT](?=[ACGT]{6}))]; only captures a single character each time. Unless the capturing parens are inside the lookahead construct, anything they match is not captured.

    Again, once you know the (slightly counter-intuitive) semantics, this proves to be quite useful.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?