This appears to work in the way I interpret your description, though there is at least one ambiguity in there so I may have dwiw'd the wrong way.
#! perl -slw use strict; my $re_resgen = qr[(.)(,(?:\w+),(?:\w+),ResGen)]; while (<DATA>) { s[$re_resgen][my $t=$1; $t.='"' unless $t eq '"'; $t.$2]e; print; } =pod output c:\test>234040 001 GENE1="Rattus norvegicus serum and glucocorticoid-regulated kinase + (sgk) mRNA, complete cds",NM_019232,333,ResGen,ATP binding|pr otein serine/threonine kinase|protein amino acid phosphorylation,,,,29 +517 002 GENE2="ESTs, Weakly similar to putative serine/threonine protein k +inase MAK-V [M.musculus]",NM_144755,331,ResGen,,,,,246273 003 GENE3="Thiosulfate sulphurtransferase (rhodanese)",X56228,329,ResG +en,mitochondrion|sulfate transport| thiosulfate sulfurtransfer ase,,,,25274 004 GENE4="Spleen tyrosine kinase",NM_012758,327,ResGen,ATP binding|pr +otein tyrosine kinase|intracellular signaling cascade|protein amino acid phosphorylation,,,,25155 005 GENE5="Spleen kinase 24,NM_012758,,ResGen,ATP binding|protein tyro +sine kinase|intracellular signaling cascade|protein amino acid phosphorylation,,,,25155 =cut __DATA__ 001 GENE1="Rattus norvegicus serum and glucocorticoid-regulated kinase + (sgk) mRNA, complete cds,NM_019232,333,ResGen,ATP binding|protein se +rine/threonine kinase|protein amino acid phosphorylation,,,,29517 002 GENE2="ESTs, Weakly similar to putative serine/threonine protein k +inase MAK-V [M.musculus]",NM_144755,331,ResGen,,,,,246273 003 GENE3="Thiosulfate sulphurtransferase (rhodanese)",X56228,329,ResG +en,mitochondrion|sulfate transport| thiosulfate sulfurtransferase,,,, +25274 004 GENE4="Spleen tyrosine kinase,NM_012758,327,ResGen,ATP binding|pro +tein tyrosine kinase|intracellular signaling cascade|protein amino ac +id phosphorylation,,,,25155 005 GENE5="Spleen kinase 24,NM_012758,,ResGen,ATP binding|protein tyro +sine kinase|intracellular signaling cascade|protein amino acid phosph +orylation,,,,25155
Examine what is said, not who speaks.
The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.
In reply to Re: Need method to create Regular expression for known pattern in the middle of a line
by BrowserUk
in thread Need method to create Regular expression for known pattern in the middle of a line
by Ya
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |