in reply to Re^4: Read, match string and print
in thread Read, match string and print

In the other code posted here, I see that you're reading the file line by line. You can't match more than one line if you're reading/processing each line separately. You will need to either set a flag or collect all information up to a point where you know that the current set of data has ended (for example because you hit the start of the next gene or EOF), and then process the accumulated data.

Replies are listed 'Best First'.
Re^6: Read, match string and print
by sophix (Sexton) on Feb 08, 2010 at 09:47 UTC
    Thanks. You are right. I try to jump between lines using \n, while in fact we are reading the file line by line. I understood what you suggested but it is very difficult for me to implement it. Can you help me please?

      It's not that hard. The process basically is:

      my %info; # here we collect all information # The name and order of the columns we want to print my @columns = qw(qw(gi version cds); sub flush_info { # print out all information: print join '*', @info{@columns}; # and forget the collected information %info = (); }; while (<>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; } else { warn "Ignoring unknown line [$_]\n"; }; }; # Output any leftover information: flush_info();
        I tried to incorporate it as follows:
        #!/usr/bin/perl use strict; use warnings; my %info; # here we collect all information my @columns = qw(gi version cds); # The name and order of the columns +we want to print my $data = '/DATA/GenBankFile.gb'; # GenBank file is located at C:\DAT +A open INFILE, '<', $data or die "Cannot open file!\n"; while (<INFILE>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; } else { warn "Ignoring unknown line [$_]\n"; }; }; # Output any leftover information: flush_info(); sub flush_info { # print out all information: print join '*', @info{@columns}; #Line 34 # and forget the collected information %info = (); };
        This always prints out the Ignoring unknown value. And it gives an error for line 34 (Use of uninitialized value)