Question on conditional regular expressions/backreferences

sterben has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks, I've been trying to figure out how to change the matching at the end of a string based on what I matched at the beginning. I've read up on conditional expressions and lookarounds, but I can't seem to figure out an elegant way to do this. As an example, take the nspI restriction enzyme recognition sequence:

/[AG]CATG[CT]/
[download]

By itself, it matches fine. However, I was wondering how I could force it to be palindromic; that is, if I match an 'A' with the first part then to force it to match a 'T' in the second. The best I've come up with so far is:

/[AG]CATG(?(?<=ACATG)T|C)/
[download]

which works ok. However, I was wondering if there is an easier/better way of doing this such that I don't have to try and match the entire sequence in the backreference and instead can just query the AG. Thank you for your time.

Comment on Question on conditional regular expressions/backreferences Select or Download Code

Replies are listed 'Best First'.
Re: Question on conditional regular expressions/backreferences by moritz (Cardinal) on Jul 01, 2008 at 22:55 UTC
In the general case your problem is ugly to solve with regexes because, from a computer science point of view, you don't need regular expressions but a (deterministic) context-free language. If the number of possible outcomes is small, you can just construct an alternation manually: `/ACATGT\|GCATGC/` [download] If you have larger sets of alternations, your best bet is to assemble the alternatives with a program (perl 5.10 has very good optimizations for larger sets of constant alternatives), or to use assertions like this: `our %map = ( A => 'T', C => 'C' ); /([AC])CATC(??{$map{$1})/` [download] (Untested). But if you do the latter, please read the warnings in perlre first, `(??{...})` is an experimental feature. This one captures the result from the `[AC]` in the variable `$1` and then matches against a transformed version of that.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Question on conditional regular expressions/backreferences
by moritz (Cardinal) on Jul 01, 2008 at 22:55 UTC

If the number of possible outcomes is small, you can just construct an alternation manually:

/ACATGT|GCATGC/
[download]

our %map = ( A => 'T', C  => 'C' );
/([AC])CATC(??{$map{$1})/
[download]

(Untested). But if you do the latter, please read the warnings in perlre first, (??{...}) is an experimental feature.

This one captures the result from the [AC] in the variable $1 and then matches against a transformed version of that.

[reply]
[d/l]
[select]