Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have tried

/([A-Za-z])\1{1}/

but this finds pairs of letters. My intent is to match only paired letters, not singular and not three or more.

"caab" should match

"cdaaadc" should not match

The reason is a puzzle I have been making for personal use. More in the nature of expanding my RE chops than anything else. Please let me know if I can provide more information or if there are any solutions to this. Thank you!

Replies are listed 'Best First'.
Re: regex to match double letters but not triple or more
by hv (Prior) on Aug 15, 2024 at 16:57 UTC

    The regexp concept that maps to "... but not followed by ..." is a negative lookahead (?!...). In this case, you want to find a letter, require that it be followed by the same letter, but that that is not followed by the same letter again:

    m{ ([A-Za-z]) # find a letter (and capture it) \1 # followed by the same (?!\1) # but not by the same again }x;

    However, this will match "baaac", since it can match starting at the third character. To reject triples fully, we also need to specify that the first character we match doesn't have the same character before it.

    Easiest would be if we could place a negative lookbehind first: /(?<!...) ([A-Za-z]) \1 (?!\1)/x. But that doesn't work in this case: we don't yet know what letter to reject.

    Next easiest would be to capture the letter of interest, then use a negative lookbehind to check two characters back: the one we just captured and its predecessor. Unforunately that doesn't work either, since perl rejects m{([A-Za-z])(?<!\1.)} at compile time: earlier perls say "Variable length lookbehind not implemented", while more recent perls say "Lookbehind longer than 255 not implemented", in either case because they are not clever enough to determine at compile-time how long the capture can be.

    So instead we have to work forward: if we're not at the start of the string, require that the notionally "preceding" character is different from our character of interest.:

    m{ (?: # either ^ # start of string | # or (.) (?!\1) # any character that is not followed by a duplicat +e ) # now proceed as before, keeping in mind this is now the second ca +pture ([A-Za-z]) \2 (?!\2) }x;

    Note that the second of these approaches only works if there is at least one character before the double letter, so will fail to match "aab" - probably not what you want in this case, so I show it only for completeness. The first of these approaches should work for all the cases you care about.

    Update 2024-08-17: struck last paragraph, which was left over from initial editing.

Re: regex to match double letters but not triple or more
by hippo (Archbishop) on Aug 15, 2024 at 16:44 UTC
Re: regex to match double letters but not triple or more
by ikegami (Patriarch) on Aug 15, 2024 at 16:18 UTC

    Depending on what you are doing, you might find it cleaner to extract all same-letter sequences, then filtering out those of the incorrect length.

    if ( grep length( $_ ) == 2, /([A-Za-z]\1+)/g ) { say "match"; } else { say "no match"; }

    Otherwise, you could use (*SKIP).

    /([A-Za-z])\1(?:\1+(*SKIP)(*FAIL))?/

    For example:

    perl -e' use v5.14; for ( "aa", "bbb", "cccc", "deefffgggghhiiiii" ) { say s/([A-Za-z])\1(?:\1+(*SKIP)(*FAIL))?/[$&]/gr; } '
    [aa] bbb cccc d[ee]fffgggg[hh]iiiii
Re: regex to match double letters but not triple or more
by tybalt89 (Monsignor) on Aug 15, 2024 at 21:56 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11161117 use warnings; for ( qw( a aa bbb cccc caab deefffgggghhiiiii aabbbccdee wwxxyyzz ) ) { printf "%20s -->", $_; print " $2" while /(?=(?:^|(.)(?!\1))(([a-z])\3)(?!\3))/gi; print "\n"; }

    Outputs:

    a --> aa --> aa bbb --> cccc --> caab --> aa deefffgggghhiiiii --> ee hh aabbbccdee --> aa cc ee wwxxyyzz --> ww xx yy zz
Re: regex to match double letters but not triple or more
by LanX (Saint) on Aug 15, 2024 at 21:57 UTC

    DB<83> $re= qr/(?:^|(.)(?!\1)) ([[:alpha:]]){2} (?!\2)/x DB<84> say join "\t", $_, $_ =~ $re for <{,x}AA{,A}{,y}> AA A AAy A AAA AAAy xAA x A xAAy x A xAAA xAAAy DB<85>

    Edit

    Just realized, in the end it's very similar to the last solution in Re: regex to match double letters but not triple or more.

    But somehow it works for me...???

    Updates
    improved character class and tests

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery