sterben has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks, I've been trying to figure out how to change the matching at the end of a string based on what I matched at the beginning. I've read up on conditional expressions and lookarounds, but I can't seem to figure out an elegant way to do this. As an example, take the nspI restriction enzyme recognition sequence:
/[AG]CATG[CT]/
By itself, it matches fine. However, I was wondering how I could force it to be palindromic; that is, if I match an 'A' with the first part then to force it to match a 'T' in the second. The best I've come up with so far is:
/[AG]CATG(?(?<=ACATG)T|C)/
which works ok. However, I was wondering if there is an easier/better way of doing this such that I don't have to try and match the entire sequence in the backreference and instead can just query the AG. Thank you for your time.

Replies are listed 'Best First'.
Re: Question on conditional regular expressions/backreferences
by moritz (Cardinal) on Jul 01, 2008 at 22:55 UTC
    In the general case your problem is ugly to solve with regexes because, from a computer science point of view, you don't need regular expressions but a (deterministic) context-free language.

    If the number of possible outcomes is small, you can just construct an alternation manually:

    /ACATGT|GCATGC/
    If you have larger sets of alternations, your best bet is to assemble the alternatives with a program (perl 5.10 has very good optimizations for larger sets of constant alternatives), or to use assertions like this:
    our %map = ( A => 'T', C => 'C' ); /([AC])CATC(??{$map{$1})/

    (Untested). But if you do the latter, please read the warnings in perlre first, (??{...}) is an experimental feature.

    This one captures the result from the [AC] in the variable $1 and then matches against a transformed version of that.