in reply to Re: skipping lines when parsing a file
in thread skipping lines when parsing a file

Hi toolic!
This is a dynamic situation, so range operators won't do.
I want to remove the text starting at "COMMENT" and
just before the line that starts with FEATURES.
LOCUS 4 302276 bp DNA linear HTG 31 +-OCT-2008 DEFINITION Mus musculus chromosome 4 NCBIM37 partial sequence 138489260..138791535 reannotated via EnsEMBL ACCESSION chromosome:NCBIM37:4:138489260:138791535:-1 KEYWORDS . SOURCE house mouse. ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. COMMENT This sequence was annotated by the Ensembl system. Please +visit the Ensembl web site, http://www.ensembl.org/ for more information. All feature locations are relative to the first (5') base +of the sequence in this file. The sequence presented is always th +e forward strand of the assembly. Features that lie outside +of the sequence contained in this file have clonal location coord +inates in the format: <clone accession>.<version>:<start>..<end> The /gene indicates a unique id for a gene, /note="transcript_id=..." a unique id for a transcript, /protein_id a unique id for a peptide and note="exon_id=.. +." a unique id for an exon. These ids are maintained wherever p +ossible between versions. All the exons and transcripts in Ensembl are confirmed by similarity to either protein or cDNA sequences. Features not parsed: gene AL807811.9.1.249021:69347..168228 /locus_tag="Capzb" /gene="ENSMUSG00000028745" /note="cappi +ng protein (actin filament) muscle Z-line, beta [Source:MGI;Acc:MGI:1 +04652]" mRNA join(complement(42593..42690), AL807811.9.1.249021:115325..115414, AL807811.9.1.249021:133677..133798, AL807811.9.1.249021:138400..138513, AL807811.9.1.249021:156412..156553, AL807811.9.1.249021:156884..157000, AL807811.9.1.249021:163445..163510, AL807811.9.1.249021:164172..164248, AL807811.9.1.249021:167419..168228) /gene="ENSMUSG00000028 +745" /note="transcript_id=ENSMUST00000102508" CDS join(complement(42593..42595), AL807811.9.1.249021:115325..115414, AL807811.9.1.249021:133677..133798, AL807811.9.1.249021:138400..138513, AL807811.9.1.249021:156412..156553, AL807811.9.1.249021:156884..157000, AL807811.9.1.249021:163445..163510, AL807811.9.1.249021:164172..164248, AL807811.9.1.249021:167419..167506) /db_xref="CCDS:CCDS188 +41.1" /db_xref="MGI:Capzb" /db_xref="Vega_mouse_transcript:OTTMUST00000022955" /protein_id="ENSMUSP00000099566" /gene="ENSMUSG00000028745 +" /note="transcript_id=ENSMUST00000102508" FEATURES Location/Qualifiers source 1..302276 /db_xref="taxon:10090" /organism="Mus musculus" gene complement(267261..268504) /note="locus_tag=Rnf186" /gene="ENSMUSG00000070661" /note="ring finger protein 186 [Source:MGI;Acc:MG +I:1914075]

Then print from "FEATURES" until the end of the line.
LomSpace

Replies are listed 'Best First'.
Re^3: skipping lines when parsing a file
by toolic (Bishop) on Aug 20, 2009 at 18:08 UTC
    Perhaps I do not understand your requirements. It would help if you were to also post your desired output. Here is an example using range operators:

    This is the output: