The regexp concept that maps to "... but not followed by ..." is a negative lookahead (?!...). In this case, you want to find a letter, require that it be followed by the same letter, but that that is not followed by the same letter again:

m{ ([A-Za-z]) # find a letter (and capture it) \1 # followed by the same (?!\1) # but not by the same again }x;

However, this will match "baaac", since it can match starting at the third character. To reject triples fully, we also need to specify that the first character we match doesn't have the same character before it.

Easiest would be if we could place a negative lookbehind first: /(?<!...) ([A-Za-z]) \1 (?!\1)/x. But that doesn't work in this case: we don't yet know what letter to reject.

Next easiest would be to capture the letter of interest, then use a negative lookbehind to check two characters back: the one we just captured and its predecessor. Unforunately that doesn't work either, since perl rejects m{([A-Za-z])(?<!\1.)} at compile time: earlier perls say "Variable length lookbehind not implemented", while more recent perls say "Lookbehind longer than 255 not implemented", in either case because they are not clever enough to determine at compile-time how long the capture can be.

So instead we have to work forward: if we're not at the start of the string, require that the notionally "preceding" character is different from our character of interest.:

m{ (?: # either ^ # start of string | # or (.) (?!\1) # any character that is not followed by a duplicat +e ) # now proceed as before, keeping in mind this is now the second ca +pture ([A-Za-z]) \2 (?!\2) }x;

Note that the second of these approaches only works if there is at least one character before the double letter, so will fail to match "aab" - probably not what you want in this case, so I show it only for completeness. The first of these approaches should work for all the cases you care about.

Update 2024-08-17: struck last paragraph, which was left over from initial editing.


In reply to Re: regex to match double letters but not triple or more by hv
in thread regex to match double letters but not triple or more by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.