Re^4: Read, match string and print

Thanks once again. Yes, it is working but it is getting the wrong input and hence prints out incorrectly.

/protein_id="NP_12312"
/db_xref="GI:7546536"
[download]

Here. in order to get the GI number, I use this matching expression elsif ( /^\s*protein_id\S*\n\s*\Sdb_xref="GI:(\d+)/ ) but it is not matching. Am I doing some fundamental mistake with the matching operators?

Comment on Re^4: Read, match string and print Select or Download Code

Replies are listed 'Best First'.
Re^5: Read, match string and print by Corion (Patriarch) on Feb 08, 2010 at 09:32 UTC
In the other code posted here, I see that you're reading the file line by line. You can't match more than one line if you're reading/processing each line separately. You will need to either set a flag or collect all information up to a point where you know that the current set of data has ended (for example because you hit the start of the next `gene` or EOF), and then process the accumulated data.	[reply] [d/l]
Re^6: Read, match string and print by sophix (Sexton) on Feb 08, 2010 at 09:47 UTC
Thanks. You are right. I try to jump between lines using \n, while in fact we are reading the file line by line. I understood what you suggested but it is very difficult for me to implement it. Can you help me please?	[reply]
Re^7: Read, match string and print by Corion (Patriarch) on Feb 08, 2010 at 10:06 UTC
It's not that hard. The process basically is: my %info; # here we collect all information # The name and order of the columns we want to print my @columns = qw(qw(gi version cds); sub flush_info { # print out all information: print join '', @info{@columns}; # and forget the collected information %info = (); }; while (<>) { if (m!GI:(\d+)!) { if ($info{cds}) { # we are in a CDS block $info{gi} = $1; }; } elsif (m!^\s+CDS\s+(.)!) { # a new gene information has started flush_info(); # now remember the CDS $info{cds} = $1 } elsif (m!^VERSION.*\w:(\d+)! ) { $info{ version } = $1; } else { warn "Ignoring unknown line [$_]\n"; }; }; # Output any leftover information: flush_info(); [download]	[reply] [d/l]
Re^8: Read, match string and print by sophix (Sexton) on Feb 08, 2010 at 10:23 UTC
Re^9: Read, match string and print by Corion (Patriarch) on Feb 08, 2010 at 10:26 UTC
Some notes below your chosen depth have not been shown here