ag88 has asked for the wisdom of the Perl Monks concerning the following question:
Hello everyone. I am new to programing and new to PERL as well of course. I needed to write a script to extract some information from a large sized file. My file looks like
# BLASTP 2.2.28+ # Query: gi|338220664|gb|EGP06123.1| hypothetical protein GEW_00005 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # Fields: query id, subject id, % identity, alignment length, mismatch +es, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 2 hits found gi|338220664|gb|EGP06123.1| gi|45383702|ref|NP_989542.1| 45.15 + 206 96 7 3 204 28 220 1e-51 170 gi|338220664|gb|EGP06123.1| gi|15419940|gb|AAK97214.1| 44.17 +206 98 7 3 204 28 220 5e-50 166 # BLASTP 2.2.28+ # Query: gi|338220666|gb|EGP06125.1| hypothetical protein GEW_00015 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found # BLASTP 2.2.28+ # Query: gi|338220651|gb|EGP06111.1| hypothetical protein GEW_00275 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found
I basically want to extract the "query line" particularly the number after "gi" in the query line only of those that have 0 hits. So in this case my matching line would be "# 0 hits found". I have wrote a small script which extract the matching line but i am unable to extract the query line and the number after gi in the query line. My code is
sub getGI { open(FILE, "twentySeq1e-10.out") or die("Cannot open file"); while(<FILE>) { my $line = $_; if($line=~/# 0 hits found/) { print "$line\n"; } } }
The desired output which I want is the number after "gi" in the query line, only of those having 0 hits. For example in this case the output would be
338220666 338220651
The "query" line is 2line before the matching line. If some one could help me with this I would be grateful. Thanks
|
|---|