Re^2: finding probes with one mutation ?

Perl is cool in that you can generate regex'es "on the fly". They do not have to be defined during initial compilation. You will be surprised at how fast this can work with Perl 5.10 which has some smart regex enhancements! With this approach you just have to check whether this regex matches or not for all of the other test cases. Not every "OR" in the regex will be evaluated with a smart regex engine.

Another way which is more like C would be to XOR each character in the "search" string vs the "test" string. You will get a "false" if they are the same and a "true" if not the same. If you get more than one "true" then the strings are off by more than one. The AGREP (approximate Grep) algorithm has a fancy way to do this.

In general I think you will find using the regex engine is faster than using substr especially on later versions of Perl. By all means use substr to make the regex!

One of my programs uses some fancy stuff which allows swapping of two letters, letter missing, letter added, and other weird stuff. This "write a program" to analyze the rest of the data approach worked well for me. When you write a regex, you are "writing a program". For example one of the things that my software does is allow "pivoting" around a digit: XY6Z->XZ6Y would be a match. This is possible with the regex engine in Perl, but the regex builder knows that this is a possible thing and instead of super duper fancy regex stuff, I just put an "or" term in the generated regex for XY6Z (i.e. that XZ6Y will match).

Update:
I've gotten a couple of very helpful suggestions from Perlbotics (THANKS!)..That "." is the right syntax for the regex instead of "?". That is true and corrected above. My program uses a Windows wildcard syntax for the UI (re-write of a legacy program) and I messed up the Perl regex syntax in this post. Another point is that the scheme above will match the original thing that the regex is based upon. I believe that this method from Perlbotics would exclude that: m/[^C]ACTCT | C[^A]CTCT | CA[^C]TCT| CAC[^T]CT| CACT[^C]T | CACTC[^T]/. My actual code does much more fancy things than just "one character" different. There are things that "pivot around a digit", allow transpositions (ABCD matches ACBD), allows one character insertions and one character deletions and in general some very special rules for how to deal with "thing A is approximately "thing B". My code takes out the "search for" term before returning the results. So this "I will match myself" is not an issue.

Anyway the main thrust of this post was that it is possible to create dynamic regex'es and use them in Perl programs. It is possible to simplify what would be horrifically complex Regex's with look ahead and look behind into simple expressions IF you can generate these Regex'es with the Perl program which you can! Of course there is a HUGE amount of application specific stuff here. But think about dynamic regex when appropriate.

Comment on Re^2: finding probes with one mutation ? Download Code