lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hi
I have file and I want to skip everything between two regexes and print the rest,
or just skip lines that meet a pattern.
while(<$in>){ print $out $_; last if $_ =~ /Mus\./; # line number before change } my $line = <$in>; next if $line =~ /(^\t\t\t\D)|(^\t\t\t\t\D)/; while(<$in>){ # print the rest of the lines #next unless $. =~ /^\t\t\t\d|\D/; print $out $_; } close $in; close $out;

This currently returns the file unchanged.

Replies are listed 'Best First'.
Re: skipping lines when parsing a file
by ssandv (Hermit) on Aug 19, 2009 at 21:35 UTC

    It might be helpful if you told us what patterns you were looking for using English, because your regexes seem a bit strange.

    As for solving your basic problem (print until X line, then stop until Y line, then start again), you might consider a state machine; that is, use a global variable that keeps track of whether you're printing or not, and change it inside the loop if the conditions change--and then just print at the end of the loop when the variable tells you you're supposed to be printing.

      Hi ssandv
      I want to remove the text starting at "COMMENT" and just before the line that starts with FEATURES.
      Then print from "FEATURES" until the end of the line.
      LomSpace

        So it appears that sections of the file are defined by words in all caps starting in column 0. This actually lends itself pretty well to keeping track of the state (in this case the file section) you're in.

        There are many other ways to do it, but this is an example of what I was suggesting:

        my $state; while (my $line=<$in>) { if ($line=~/^([A-Z]+)/) { $state=$1; } print $line unless $state eq "COMMENT"; }
        which outputs:
        07:37<sandvik@sat1> ~/perl$ ./pmtest.pl LOCUS 4 302276 bp DNA linear HTG 31 +-OCT-2008 DEFINITION Mus musculus chromosome 4 NCBIM37 partial sequence 138489260..138791535 reannotated via EnsEMBL ACCESSION chromosome:NCBIM37:4:138489260:138791535:-1 KEYWORDS . SOURCE house mouse. ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. FEATURES Location/Qualifiers source 1..302276 /db_xref="taxon:10090" /organism="Mus musculus" gene complement(267261..268504) /note="locus_tag=Rnf186" /gene="ENSMUSG00000070661" /note="ring finger protein 186 [Source:MGI;Acc:MG +I:1914075]
Re: skipping lines when parsing a file
by superfrink (Curate) on Aug 19, 2009 at 21:42 UTC
Re: skipping lines when parsing a file
by toolic (Bishop) on Aug 19, 2009 at 23:46 UTC
      Hi toolic!
      This is a dynamic situation, so range operators won't do.
      I want to remove the text starting at "COMMENT" and
      just before the line that starts with FEATURES.
      Then print from "FEATURES" until the end of the line.
      LomSpace
        Perhaps I do not understand your requirements. It would help if you were to also post your desired output. Here is an example using range operators:

        This is the output:

Re: skipping lines when parsing a file
by didess (Sexton) on Aug 19, 2009 at 22:59 UTC
    Hi !

    1st : next what ? on line 6 of your code sample: You are not in a loop.

    To skip a block of lines, between 2 regexes, skipping the lines with the regexes too, I should write :

    my $inBlock = 0; while(<$in>) { if ($inBlock) { $inBlock = 0 if ( $_ =~ /2nd Regex/); } else { $inBlock = 1 if ( $_ =~ /1st Regex/); print $_ if(!$inBlock); } }
    Hope it helps (not tested!)
Re: skipping lines when parsing a file
by baxy77bax (Deacon) on Aug 19, 2009 at 23:01 UTC
    hi,

    ssandv gave you a good pointer. because from the code that you presented you are looping through a whole file and printing stuff, then exiting the loop keeping the file handle opened (which i strongly disapprove), then trying to capture something from the filehandle and then again repeating the first step.you gotta admit you intentions are a bit confusing ;) maybe something like this will help:

    use strict; ... my $trigger = 0; while(<$in>){ print $out $_ if ($trigger == 0); $trigger = 1 if ($_=~/match something|or something else/g); $trigger = 0 if ($_=~/match something|or something else/g); }
    this wasn't tested due to having nothing to test it on, but you can try to embed it and try it on a real data to see...

    cheers