comment on

Many thanks for all of the responses. GrandFather, you were right on target as always; tye, thanks for cutting to the heart of the problem and pointing out a duh moment for me. :-)

The first problem is that character classes are constructed when the regexp is compiled, and do not change during the matching process. Because of that the special syntax for backreferences in regexps does not extend inside the character class, so as tye mentioned the '\2' is actually treated as ASCII character 2.

the ($!\2) in your example actually interpolated the $! error variable into your regexp

Thank you for stating that so explicitly - that was a core piece of knowledge that I was missing. Now I understand why my incorrect negative lookahead was matching four characters. I can't believe I missed the obvious typo in the lookahead ($! instead of ?!. I guess that's what I get for playing with regexen so late at night. :-)

This gives you something nice and regular - it would be quite easy to write code to generate the above from the example string. Here's how it might work:

Thanks for the great example for building this type of regex on the fly. I wanted to capture the whole match, so I changed it as follows:

my $regex = mkre($s);
while( $string =~ m/$regex/g )
{
    print $1, "\n";
    # do other stuff
}

sub mkre {
  my $s = shift;
  my $index = 1; # using \1 to capture the whole match
  my(%seen, @elems);
  for (split //, $s) {
    if ($seen{$_}) {
      push @elems, "\\$seen{$_}";
    } else {
      push @elems, sprintf '(?! %s)', join ' | ', map "\\$_", 2 .. $in
+dex
          if $index > 1; # changed to start with \2
      $seen{$_} = ++$index;
      push @elems, '(\\w)';
    }
  }
  my $re = join( ' ', '(', @elems, ')' ); # create \1
  warn "$s: $re\n" if $DEBUG;
  qr/$re/x;
}
[download]

Then I realized I could have left the sub as-is and just printed $& instead. :-)

Thanks again for the help, and for such a elegant solution.

Update: japhy++ Very nice solution - taking that approach would enable me to create much more flexible (and more powerful) regexps. Thanks for posting it.

In reply to Re^2: Backreferences in negated character classes by bobf
in thread Backreferences in negated character classes by bobf

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.