in reply to Re^6: Read, match string and print
in thread Read, match string and print

It's not that hard. The process basically is:

my %info; # here we collect all information # The name and order of the columns we want to print my @columns = qw(qw(gi version cds); sub flush_info { # print out all information: print join '*', @info{@columns}; # and forget the collected information %info = (); }; while (<>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; } else { warn "Ignoring unknown line [$_]\n"; }; }; # Output any leftover information: flush_info();

Replies are listed 'Best First'.
Re^8: Read, match string and print
by sophix (Sexton) on Feb 08, 2010 at 10:23 UTC
    I tried to incorporate it as follows:
    #!/usr/bin/perl use strict; use warnings; my %info; # here we collect all information my @columns = qw(gi version cds); # The name and order of the columns +we want to print my $data = '/DATA/GenBankFile.gb'; # GenBank file is located at C:\DAT +A open INFILE, '<', $data or die "Cannot open file!\n"; while (<INFILE>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; } else { warn "Ignoring unknown line [$_]\n"; }; }; # Output any leftover information: flush_info(); sub flush_info { # print out all information: print join '*', @info{@columns}; #Line 34 # and forget the collected information %info = (); };
    This always prints out the Ignoring unknown value. And it gives an error for line 34 (Use of uninitialized value)

      Nowhere in that source code does the word value appear, so I highly doubt that the program ever outputs the string Ignoring unknown value. I put the following line into the program:

      warn "Ignoring unknown line [$_]\n";

      in there so you know what parts of the input get discarded. In that line, I even output the part that gets discarded. If you determine that a certain line is never of use, just add another rule ignoring it.

        So I turned off the warnings
        #!/usr/bin/perl use strict; use warnings; my %info; # here we collect all information my @columns = qw(gi version cds); # The name and order of the columns +we want to print my $data = '/PRBB/Practice.gb'; # GenBank file is located at C:\PRBB open INFILE, '<', $data or die "Please insert a new coin!\n"; while (<INFILE>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.*)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; #} else {warn "Ignoring unknown line [$_]\n";}; };} # Output any leftover information: flush_info(); sub flush_info { # print out all information: print join '*', @info{@columns}; # and forget the collected information %info = (); };
        This gives an error at print join '*', @info{@columns}; And it prints all the matchings into one single line. How is it possible to print them out as would be in the following structure:
        if ( defined $cds && defined $gi && defined $version ) { # Print only +when all variables are defined print "$gi\t$version\t$cds\n"; $gi = $cds = undef; # Get ready for the next loop }
        Thanks for sparing time!