Program Input:
Following are of interest: **carboxypeptidase** protein $$inhibitor$$ ( **CI** ) , **nanopeptidase** kinase $$inhibitor$$ , **NI** , and others such as , **p(57)** and **polypeptidase** protein $$inhibitor$$ ( **PI** ).
Program Output:
1. Following are of interest: **carboxypeptidase_protein_inhibitor_(CI)** , **nanopeptidase_kinase_inhibitor_(NI)** and others such as , **p(57)** and **polypeptidase_protein_inhibitor_(PI)**.
2. Following are of interest: **carboxypeptidase** protein $$inhibitor$$ ( **CI** ) , nanopeptidase kinase inhibitor , NI , and others such as , p(57) and polypeptidase protein inhibitor ( PI ).
3. Following are of interest: carboxypeptidase protein inhibitor ( CI ) , **nanopeptidase** kinase $$inhibitor$$ , **NI** , and others such as , p(57) and polypeptidase protein inhibitor ( PI ).
4. Following are of interest: carboxypeptidase protein inhibitor ( CI ) , nanopeptidase kinase inhibitor , NI , and others such as , p(57) and **polypeptidase** protein $$inhibitor$$ ( **PI** ).
While I can achieve output 1. using the regular expression substitution as shown below, I cannot figure out how output sentences 2,3 and 4 could be achieved.
if ($line =~ /\*\*([^\*]+)\*\*\s(kinase|isoform|protein|peptide|li +gand)\s\$\$([^\$]+)\$\$\s[\(\,]\s\*\*([^\*]+)\*\*\s[\)\,]/) { $line =~ s/\*\*([^\*]+)\*\*\s(kinase|isoform|protein|peptide|l +igand)\s\$\$([^\$]+)\$\$\s[\(\,]\s\*\*([^\*]+)\*\*\s[\)\,]/**$1_$2_$3 +_($4)**/g; print WF "$line\n"; }
While output sentence 1 represents the original sentence with all substitutions using the above code (there are 3 substitutions in this example although this number can vary with the sentence).
Each of the other remaining output sentences (e.g. 2,3 and 4) are the original input sentence, except that, the original pattern is retained in the sentence at the substitution location, while the tags in the sentence (i.e. ** and $$) are removed from all other places in the sentence. The number of such output sentences thus will be equal to the number of patterns substituted using the regex above (which is 3 in this example because there are 3 pattern substituted as shown in output 1.). Is there a nice way of doing this (getting outputs 2,3 and 4)?
Appreciate your help.
Thanks very much in advance.
In reply to regex pattern match problem by newbio
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |