in reply to Re^5: Read, match string and print
in thread Read, match string and print

Thanks. You are right. I try to jump between lines using \n, while in fact we are reading the file line by line. I understood what you suggested but it is very difficult for me to implement it. Can you help me please?

Replies are listed 'Best First'.
Re^7: Read, match string and print
by Corion (Patriarch) on Feb 08, 2010 at 10:06 UTC

    It's not that hard. The process basically is:

    my %info; # here we collect all information # The name and order of the columns we want to print my @columns = qw(qw(gi version cds); sub flush_info { # print out all information: print join '*', @info{@columns}; # and forget the collected information %info = (); }; while (<>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; } else { warn "Ignoring unknown line [$_]\n"; }; }; # Output any leftover information: flush_info();
      I tried to incorporate it as follows:
      #!/usr/bin/perl use strict; use warnings; my %info; # here we collect all information my @columns = qw(gi version cds); # The name and order of the columns +we want to print my $data = '/DATA/GenBankFile.gb'; # GenBank file is located at C:\DAT +A open INFILE, '<', $data or die "Cannot open file!\n"; while (<INFILE>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; } else { warn "Ignoring unknown line [$_]\n"; }; }; # Output any leftover information: flush_info(); sub flush_info { # print out all information: print join '*', @info{@columns}; #Line 34 # and forget the collected information %info = (); };
      This always prints out the Ignoring unknown value. And it gives an error for line 34 (Use of uninitialized value)

        Nowhere in that source code does the word value appear, so I highly doubt that the program ever outputs the string Ignoring unknown value. I put the following line into the program:

        warn "Ignoring unknown line [$_]\n";

        in there so you know what parts of the input get discarded. In that line, I even output the part that gets discarded. If you determine that a certain line is never of use, just add another rule ignoring it.