in reply to Re^2: DNA Pattern Matching
in thread DNA Pattern Matching

This reply may have been so long delayed as to render it otiose; for the sake of completeness, however, I'll post it.

So the $ppi_pm_seq is defined as an example sequence. ... The $n is defined as 3, since we are after the middle base of the 7-mer sequence. ... $my_extraction looks to set the pattern of the sequence into three variables that instead of being $1, $2, and $3 ... containing the first 3 bases ... the middle base, and the last three bases? The /A designates the first pattern to match only at the beginning of the string, while the \z directs the third pattern to match only at the end of the string, correct? Is the xms required?

This seems like a good recap of the regex parsing/extraction section of my example code. Yes,  \A \z are absolute string-end anchors. The  /x of the  /xms regex modifier "tail" is certainly needed because I'm using whitespace in the regex for readability; without an  /x modifier, the regex engine would have to match the literal regex whitespace against corresponding whitespace in the string — not what you want. The  /m /s modifiers of the tail are not strictly needed in the given regex, but are there because every  qr// m// s/// regex expression I write uses a standard  /xms tail. This is in line with the regex-specific recommendations of TheDamian's Perl Best Practices (PBP). Not everyone agrees with all of the PBPs and nor do I, but I warmly embrace and heartily recommend all the regex BPs.

The next block is confusing to me ... you are setting each of those variables to be the evaluation of the pattern variables (map{block}) $before and $after with the system default variable $_ serving to hold the center base for each $mid? and then following it with an evaluation where the center base is not equal to the target stored in $mid? for the list of A,T,C, and G?

As I have said, I really don't understand your overall intent. However, I latched on to the "... use a series of if/elsif for each of the four possible combinations of $2 (A, C, T, G) to store the seq_id's for each group ..." passage in the OP and thought I could give an example of permuting all possible combinations. Once again, your recap of the actions of my code example seems accurate.

Is this just creating the possible permutations of the target string based on the possible outcomes?

Yes. Whether this is what you need or not is still beyond me.

Update: Made a number of small changes to enhance clarity. Since none of the changes seem to affect context, I will not bother to enumerate them.


Give a man a fish:  <%-{-{-{-<