Print a previous to previous of a matching line

ag88 has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone. I am new to programing and new to PERL as well of course. I needed to write a script to extract some information from a large sized file. My file looks like

# BLASTP 2.2.28+
# Query: gi|338220664|gb|EGP06123.1| hypothetical protein GEW_00005 [P
+asteurella multocida subsp. gallicida str. Anand1_poultry]
# Database: nr-25sep
# Fields: query id, subject id, % identity, alignment length, mismatch
+es, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 2 hits found
gi|338220664|gb|EGP06123.1|    gi|45383702|ref|NP_989542.1|    45.15  
+  206    96    7    3    204    28    220    1e-51     170
gi|338220664|gb|EGP06123.1|    gi|15419940|gb|AAK97214.1|    44.17    
+206    98    7    3    204    28    220    5e-50     166
# BLASTP 2.2.28+
# Query: gi|338220666|gb|EGP06125.1| hypothetical protein GEW_00015 [P
+asteurella multocida subsp. gallicida str. Anand1_poultry]
# Database: nr-25sep
# 0 hits found
# BLASTP 2.2.28+
# Query: gi|338220651|gb|EGP06111.1| hypothetical protein GEW_00275 [P
+asteurella multocida subsp. gallicida str. Anand1_poultry]
# Database: nr-25sep
# 0 hits found
[download]

I basically want to extract the "query line" particularly the number after "gi" in the query line only of those that have 0 hits. So in this case my matching line would be "# 0 hits found". I have wrote a small script which extract the matching line but i am unable to extract the query line and the number after gi in the query line. My code is

sub getGI
{
open(FILE, "twentySeq1e-10.out") or die("Cannot open file");
while(<FILE>)
{
my $line = $_;

if($line=~/# 0 hits found/)
{
print "$line\n";
}
}
}
[download]

The desired output which I want is the number after "gi" in the query line, only of those having 0 hits. For example in this case the output would be

338220666
338220651
[download]

The "query" line is 2line before the matching line. If some one could help me with this I would be grateful. Thanks

Comment on Print a previous to previous of a matching line Select or Download Code

Replies are listed 'Best First'.
Re: Print a previous to previous of a matching line by kcott (Archbishop) on Oct 08, 2013 at 11:35 UTC
G'day ag88, Welcome to the monastery. You can treat each `BLASTP` block as a single record. This makes it easy to identify which have "`0 hits found`", and print their "`gi`" values. (In the code below, I've truncated the data lines to 60 characters.) #!/usr/bin/env perl -l use strict; use warnings; { local $/ = "# BLASTP 2.2.28+\n"; while (<DATA>) { print /gi\\|(\d+)/ if /0 hits found/; } } __DATA__ # BLASTP 2.2.28+ # Query: gi\|338220664\|gb\|EGP06123.1\| hypothetical protein GE # Database: nr-25sep # Fields: query id, subject id, % identity, alignment length # 2 hits found gi\|338220664\|gb\|EGP06123.1\| gi\|45383702\|ref\|NP_989542.1\| gi\|338220664\|gb\|EGP06123.1\| gi\|15419940\|gb\|AAK97214.1\| 44.1 # BLASTP 2.2.28+ # Query: gi\|338220666\|gb\|EGP06125.1\| hypothetical protein GE # Database: nr-25sep # 0 hits found # BLASTP 2.2.28+ # Query: gi\|338220651\|gb\|EGP06111.1\| hypothetical protein GE # Database: nr-25sep # 0 hits found [download] Output: `338220666 338220651` [download] See "perlvar: Variables related to filehandles" for a discussion of this usage of "`$/`" (the input record separator). -- Ken	[reply] [d/l] [select]
Re: Print a previous to previous of a matching line by jethro (Monsignor) on Oct 08, 2013 at 09:40 UTC
`sub getGI { my $previous1,$previous2; open(FILE, "twentySeq1e-10.out") or die("Cannot open file"); while(<FILE>) { my $line = $_; if($line=~/# 0 hits found/) { print "$previous2\n"; } $previous2= $previous1; $previous1= $line; }` [download] The generalized solution would use an array. You use unshift() to add the line at the start of the array and you use pop() to remove the last line if the array has length n+1 (with n being the number of lines you want to remember). That is called a pipeline, queue, shift register or FIFO (first-in-first-out).	[reply] [d/l]
Re^2: Print a previous to previous of a matching line by ag88 (Novice) on Oct 08, 2013 at 10:14 UTC
Thankyou soo much for help it worked. Thanks alot :)	[reply]
Re: Print a previous to previous of a matching line by McA (Priest) on Oct 08, 2013 at 09:46 UTC
Hi, in this case I would take the following approach: `#!/usr/bin/env perl use strict; use warnings; use 5.010; # read first line assuming it is a kind of block seperator my $bsep = <>; $/ = $bsep; while(defined(my $block = <>)) { chomp $block; my @records = split /\n/, $block; next if @records < 3; # malformed block foreach my $record (@records) { say $record; } say "========================"; }` [download] Now you can find and operate on every block how you like. Best regards McA	[reply] [d/l]
Re: Print a previous to previous of a matching line by hippo (Archbishop) on Oct 08, 2013 at 09:40 UTC
This is not an uncommon task. Here is one possible solution which you can alter to suit your particular requirements.	[reply]
Re: Print a previous to previous of a matching line by Anonymous Monk on Oct 08, 2013 at 09:14 UTC
Put `my $previous_line;` at the top, and then assign to `$previous_line` some place that it makes sense, and then when you want to do the printing, I forget	[reply] [d/l] [select]
Re^2: Print a previous to previous of a matching line by ag88 (Novice) on Oct 08, 2013 at 09:23 UTC
I want to get a line previous to a previous line. In short words 2nd previous line to a matching line.	[reply]
Re^3: Print a previous to previous of a matching line by Anonymous Monk on Oct 08, 2013 at 09:29 UTC
yes, the answer is the same	[reply]
Re: Print a previous to previous of a matching line by Generoso (Prior) on Oct 08, 2013 at 19:06 UTC
Try this it works for me. #!/usr/bin/perl -w use strict; use warnings; #open(FILE, "twentySeq1e-10.out") or die("Cannot open file"); my $gi; while(<DATA>){ if(/^# Query: gi.([0-9]+)/) {$gi = $1;} if(/^# 0 hits found/){print "$gi\n";} } __DATA__ # BLASTP 2.2.28+ # Query: gi\|338220664\|gb\|EGP06123.1\| hypothetical protein GEW_00005 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # Fields: query id, subject id, % identity, alignment length, mismatch +es, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 2 hits found gi\|338220664\|gb\|EGP06123.1\| gi\|45383702\|ref\|NP_989542.1\| 45.15 + 206 96 7 3 204 28 220 1e-51 170 gi\|338220664\|gb\|EGP06123.1\| gi\|15419940\|gb\|AAK97214.1\| 44.17 +206 98 7 3 204 28 220 5e-50 166 # BLASTP 2.2.28+ # Query: gi\|338220666\|gb\|EGP06125.1\| hypothetical protein GEW_00015 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found # BLASTP 2.2.28+ # Query: gi\|338220651\|gb\|EGP06111.1\| hypothetical protein GEW_00275 [P +asteurella multocida subsp. gallicida str. Anand1_poultry] # Database: nr-25sep # 0 hits found [download] RESULT `Process started >>> 338220666 338220651 <<< Process finished. (Exit code 0)` [download]	[reply] [d/l] [select]
Re: Print a previous to previous of a matching line by pemungkah (Priest) on Oct 08, 2013 at 23:53 UTC
Anonymous Monk's suggestion is the right one, just a bit elliptic. Let's phrase this another way: You're going through the file a line at a time. If you see a line you might want (a "Query" line), you save it in a variable and keep reading lines. If you see a "hits" line that matches your criterion, the line you saved was one you want. Print it or stick it in an array for later, or... If you see a "hits" line and it doesn't match your criterion, then you don't want the "Query" you saw previously. Throw it away by setting the variable to "". I didn't write out the code because I think it might be more useful to you to write the code yourself. You shouldn't need anything more complicated than one variable to keep the query line in, another one to read the next line from <STDIN> into, a while() loop to keep reading until you're out of lines, and an a couple of if() statements (is this line a "query" line, is this line a "hits" line, does this "hits" line meet my "I want the last 'Query' line" criterion) inside the loop. You don't even need a trailing check outside the loop, because a "hits" line always follows a "Query" line.	[reply]
Re^2: Print a previous to previous of a matching line by ag88 (Novice) on Oct 11, 2013 at 07:49 UTC
Thankyou all for the suggestions. It was really helpful. The following code did my task `sub getGiForZeroHits { my $previous1,$previous2; open(FILEOUT,">giForZeroHits.txt") or die("Cannot open file"); { open(FILE, "$inputSeqFileForBlast-1e-10.out") or die("Cannot open file +"); { while(<FILE>) { my $line = $_; if($line=~/# 0 hits found/) { my @lineSpl = split(/\\|/, $previous2); print FILEOUT "$lineSpl[1]\n"; } #close if $previous2= $previous1; $previous1= $line; } #close while close(FILE); } #close FILE close(FILEOUT) } #close FILEOUT } #close sub` [download]	[reply] [d/l]
Re: Print a previous to previous of a matching line by BillKSmith (Monsignor) on Oct 08, 2013 at 20:10 UTC
You may be asking the wrong question. It appears that you want to consider two fields from each "logical record". If you could parse your data file first into records and then into fields, your strange requirement would disapear. Bill	[reply]