in reply to Extracting multiple rows in a text file with a regex.
If I may add this. You can step through your data line by line getting what you want, then to get all your "Nucleotide Sequence", since you have a "blanck" line used as delimiter, then you could use perl "flip-flop" operator (..) as it is called like so:
Produces ..use warnings; use strict; while(<DATA>){ if(/Name:\s+?(.+?)$/){ print $1,$/; } if(/Nucleotide Sequence/../^\s*$/){ # use "flip-flop" operator s/.*:\s+?//; # remove the Nucleotide Sequence to :,then print print } } __DATA__ GeneID: 1002 Name: cadherin 4, type 1, R-cadherin (retinal) Chromo: 20 Cytoband: 20q13.3 Nucleotide Sequence: atgaccgcgggcgccggcgtgctccttctgctgctctcgctctccggc acagcgagactggagatatcgtcacagtggcggctggcctggaccgagagaaagttcagcagtacacag cagcttgcgcatcctgtacctggaggccgggatgtatgacgtccccatcatcgtcacagactctggaaa GeneID: 10077 Name: tetraspanin 32 Chromo: 11 Cytoband: 11p15.5 Nucleotide Sequence: atggggccttggagtcgagtcagggttgccaaatgccagatgctggtc GeneID: 10078 Name: tumor suppressing subtransferable candidate 4 Chromo: 11 Cytoband: 11p15.5 Nucleotide Sequence: atggctgaggcaggaacaggtgagccgtcccccagcgtggagggcgaa
cadherin 4, type 1, R-cadherin (retinal) atgaccgcgggcgccggcgtgctccttctgctgctctcgctctccggc acagcgagactggagatatcgtcacagtggcggctggcctggaccgagagaaagttcagcagtacacag cagcttgcgcatcctgtacctggaggccgggatgtatgacgtccccatcatcgtcacagactctggaaa tetraspanin 32 atggggccttggagtcgagtcagggttgccaaatgccagatgctggtc tumor suppressing subtransferable candidate 4 atggctgaggcaggaacaggtgagccgtcccccagcgtggagggcgaa
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Extracting multiple rows in a text file with a regex.
by Skeeve (Parson) on Jul 29, 2013 at 06:05 UTC |