in reply to Re^2: skipping lines when parsing a file
in thread skipping lines when parsing a file
use strict; use warnings; while (<DATA>) { print unless (/COMMENT/ .. /FEATURES/); print if (/FEATURES/); } __DATA__ LOCUS 4 302276 bp DNA linear HTG 31 +-OCT-2008 DEFINITION Mus musculus chromosome 4 NCBIM37 partial sequence 138489260..138791535 reannotated via EnsEMBL ACCESSION chromosome:NCBIM37:4:138489260:138791535:-1 KEYWORDS . SOURCE house mouse. ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. COMMENT This sequence was annotated by the Ensembl system. Please +visit the Ensembl web site, http://www.ensembl.org/ for more information. All feature locations are relative to the first (5') base +of the sequence in this file. The sequence presented is always th +e forward strand of the assembly. Features that lie outside +of the sequence contained in this file have clonal location coord +inates in the format: <clone accession>.<version>:<start>..<end> The /gene indicates a unique id for a gene, /note="transcript_id=..." a unique id for a transcript, /protein_id a unique id for a peptide and note="exon_id=.. +." a unique id for an exon. These ids are maintained wherever p +ossible between versions. All the exons and transcripts in Ensembl are confirmed by similarity to either protein or cDNA sequences. Features not parsed: gene AL807811.9.1.249021:69347..168228 /locus_tag="Capzb" /gene="ENSMUSG00000028745" /note="cappi +ng protein (actin filament) muscle Z-line, beta [Source:MGI;Acc:MGI:1 +04652]" mRNA join(complement(42593..42690), AL807811.9.1.249021:115325..115414, AL807811.9.1.249021:133677..133798, AL807811.9.1.249021:138400..138513, AL807811.9.1.249021:156412..156553, AL807811.9.1.249021:156884..157000, AL807811.9.1.249021:163445..163510, AL807811.9.1.249021:164172..164248, AL807811.9.1.249021:167419..168228) /gene="ENSMUSG00000028 +745" /note="transcript_id=ENSMUST00000102508" CDS join(complement(42593..42595), AL807811.9.1.249021:115325..115414, AL807811.9.1.249021:133677..133798, AL807811.9.1.249021:138400..138513, AL807811.9.1.249021:156412..156553, AL807811.9.1.249021:156884..157000, AL807811.9.1.249021:163445..163510, AL807811.9.1.249021:164172..164248, AL807811.9.1.249021:167419..167506) /db_xref="CCDS:CCDS188 +41.1" /db_xref="MGI:Capzb" /db_xref="Vega_mouse_transcript:OTTMUST00000022955" /protein_id="ENSMUSP00000099566" /gene="ENSMUSG00000028745 +" /note="transcript_id=ENSMUST00000102508" FEATURES Location/Qualifiers source 1..302276 /db_xref="taxon:10090" /organism="Mus musculus" gene complement(267261..268504) /note="locus_tag=Rnf186" /gene="ENSMUSG00000070661" /note="ring finger protein 186 [Source:MGI;Acc:MG +I:1914075]
This is the output:
LOCUS 4 302276 bp DNA linear HTG 31 +-OCT-2008 DEFINITION Mus musculus chromosome 4 NCBIM37 partial sequence 138489260..138791535 reannotated via EnsEMBL ACCESSION chromosome:NCBIM37:4:138489260:138791535:-1 KEYWORDS . SOURCE house mouse. ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. FEATURES Location/Qualifiers source 1..302276 /db_xref="taxon:10090" /organism="Mus musculus" gene complement(267261..268504) /note="locus_tag=Rnf186" /gene="ENSMUSG00000070661" /note="ring finger protein 186 [Source:MGI;Acc:MG +I:1914075]
|
|---|