Re: skipping lines when parsing a file

Replies are listed 'Best First'.
Re^2: skipping lines when parsing a file by lomSpace (Scribe) on Aug 20, 2009 at 13:41 UTC
Hi ssandv I want to remove the text starting at "COMMENT" and just before the line that starts with FEATURES. Read more... (5 kB) Then print from "FEATURES" until the end of the line. LomSpace	[reply] [d/l]
Re^3: skipping lines when parsing a file by ssandv (Hermit) on Aug 20, 2009 at 14:51 UTC
So it appears that sections of the file are defined by words in all caps starting in column 0. This actually lends itself pretty well to keeping track of the state (in this case the file section) you're in. There are many other ways to do it, but this is an example of what I was suggesting: `my $state; while (my $line=<$in>) { if ($line=~/^([A-Z]+)/) { $state=$1; } print $line unless $state eq "COMMENT"; }` [download] which outputs: 07:37<sandvik@sat1> ~/perl$ ./pmtest.pl LOCUS 4 302276 bp DNA linear HTG 31 +-OCT-2008 DEFINITION Mus musculus chromosome 4 NCBIM37 partial sequence 138489260..138791535 reannotated via EnsEMBL ACCESSION chromosome:NCBIM37:4:138489260:138791535:-1 KEYWORDS . SOURCE house mouse. ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. FEATURES Location/Qualifiers source 1..302276 /db_xref="taxon:10090" /organism="Mus musculus" gene complement(267261..268504) /note="locus_tag=Rnf186" /gene="ENSMUSG00000070661" /note="ring finger protein 186 [Source:MGI;Acc:MG +I:1914075] [download]	[reply] [d/l] [select]
Re^4: skipping lines when parsing a file by lomSpace (Scribe) on Aug 20, 2009 at 17:02 UTC
ssandv, Ok, this is pretty simple and straight forward. The regex captures the capital letter(word) at the beginning of the line and checks for the next line that begins with a capital word. When it comes across that line it starts printing again? When I run it `#!/usr/bin/perl -w use strict; =cut The script parses the targeting.gb file and creates a new file that contains removes the comment info. =cut open(my $in, "C:/Documents and Settings/mydir/Desktop/TARGETING.gb"); open(my $out, ">C:/Documents and Settings/mydir/Desktop/TARGET.gb"); my $state; while(my $line =<$in>){ if ($line=~/^([A-Z]+)/) { $state=$1; } print $out $line unless $state eq "COMMENT"; } close $in; close $out` [download] The file is not being parsed. The out file is still the same as the in. Any idea? LomSpace	[reply] [d/l]
Re^5: skipping lines when parsing a file by ssandv (Hermit) on Aug 20, 2009 at 17:46 UTC
Re^4: skipping lines when parsing a file by lomSpace (Scribe) on Aug 20, 2009 at 17:42 UTC
I am doing this in windows. Could that be a the problem why it is not parsing? LomSpace	[reply]
Re^5: skipping lines when parsing a file by ssandv (Hermit) on Aug 20, 2009 at 17:50 UTC