bobf has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to construct a regex that will match the pattern 'XXY', where X and Y can be any word character, but X and Y must be different characters. In English, I want to find all occurrences where a given character is duplicated and is followed by a different character.
Attempt 1: At first I thought this would be relatively straightforward by using a backreference in a negated character class, but I seem to be missing something. Could someone please explain why the code below doesn't DWIM?
use strict; use warnings; my $string = 'ABCDEEFGHIJJJKLMNOOOOPQRSTUVWXXYZ'; print "matching: $string\n"; while( $string =~ m/((\w)\2[^\2])/g ) { print $1, "\n"; } =pod matching: ABCDEEFGHIJJJKLMNOOOOPQRSTUVWXXYZ EEF JJJ OOO XXY =cut
The regex is matching 'EEF' and 'XXY', which are correct, but it is also matching 'JJJ' and 'OOO'. The negated character class isn't acting how I expected.
Attempt 2: I also tried using a negative lookahead assertion, but also without success:
while( $string =~ m/((\w)\2($!\2)\w)/g ) { print $1, "\n"; } =pod matching: ABCDEEFGHIJJJKLMNOOOOPQRSTUVWXXYZ JJJK OOOO =cut
This regex matches four characters rather than the 3 I expected (since the lookahead is zero-width), and it also lacks specificity at the last position (matching 'OOOO').
The whole story: Understanding this problem is only part of my goal. I'm actually trying to match 'AABCCCCAD'. My first attempt was this:
but, given my first question, this obviously doesn't work.$string = 'WXYYZAABCCCCADWWXYYYZ'; while( $string =~ m/((\w)\2([^\2])([^\2\3]){4}\1[^\2\3\4])/g ) { print $1, "\n"; }
Educate me in the ways of thine regexen, that I might faithfully wield their power.
Many thanks in advance.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Backreferences in negated character classes (two)
by tye (Sage) on Dec 21, 2005 at 08:03 UTC | |
|
Re: Backreferences in negated character classes
by GrandFather (Saint) on Dec 21, 2005 at 07:53 UTC | |
|
Re: Backreferences in negated character classes
by GrandFather (Saint) on Dec 21, 2005 at 08:18 UTC | |
|
Re: Backreferences in negated character classes
by hv (Prior) on Dec 21, 2005 at 10:55 UTC | |
by bobf (Monsignor) on Dec 21, 2005 at 15:22 UTC | |
|
Re: Backreferences in negated character classes
by japhy (Canon) on Dec 21, 2005 at 14:47 UTC |