cowboyrocks has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks
I am trying to catch the subsequent last line "Sbjct:". Suppose "Sbjct: 38348698" and "Sbjct: 94217074" lines in this example but
unable to do so. How can I catch these lines ?
My code looks like this :-
if($_ =~ /Sbjct:\s*(\d+)\D+(\d+)*$/ .. /\n\n\n\AQuery\=/) { print "$_\n" }
My input file looks like this:-
Query= 30 3'_UTR BP;A;C;G;T;OTHER AGGCATAAACCACATCCAGCCACCTCCTTCTGATCAGCAGCAAAGCTGACGTTTTGATCTCCATCTGTCT GATTCTTGTGTCTACTTCTCAGTTTACAACTCCAGTGGGAAAGAAAGAGCTTTATTTACAGACCCATAAA AATCCCATCAGTGTCGTCCCCTGCTGAGAGGCCATGTGAGACCATATGGAAAAACAACAGCCATAATGGC AGCATGGCAGTGGAAGGGTTTGTCTTGTGCCCAGGCCTTGCGGTCATGCAAGTTTCTTGTGGATCCTGTT (633 letters) Sco +re E Sequences producing significant alignments: (Bi +ts) Value gi|51511750|ref|NC_000021.7|NC_000021 Homo sapiens chromosome ... 124 +3 0E0 >gi|51511750|ref|NC_000021.7|NC_000021 Homo sapiens chromosome 21, ref +erence assembly, complete sequence Length = 46944323 Score = 1243 bits (629), Expect = 0E0 Identities = 632/633 (99%) Strand = Plus / Minus Query: 1 AGGCATAAACCACATCCAGCCACCTCCTTCTGATCAGCAGCAAAGCTGACGTTT +TGATCT 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||||| Sbjct: 38348818 AGGCATAAACCACATCCAGCCACCTCCTTCTGATCAGCAGCAAAGCTGACGTTT +TGATCT 38348759 Query: 61 CCATCTGTCTGATTCTTGTGTCTACTTCTCAGTTTACAACTCCAGTGGGAAAGA +AAGAGC 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||||| Sbjct: 38348758 CCATCTGTCTGATTCTTGTGTCTACTTCTCAGTTTACAACTCCAGTGGGAAAGA +AAGAGC 38348699 Query: 121 TTTATTTACAGACCCATAAAAATCCCATCAGTGTCGTCCCCTGCTGAGAGGCCA +TGTGAG 180 |||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||||| Sbjct: 38348698 TTTATTTACAGACCCATAAAAATCCCATCAGTGTCGTCCCCTGCTGAGAGGCCA +TGTGAG 38348639 Query= 72 3'_UTR BP;A;C;G;T;OTHER CAAGAAACTATATAGGTATACACTTACGACTTCACAAAACCTATACTTAATATAGTAAATCTAAGTAAAC ATGTATTACTCAAAGTAATATATTTAGAATTATGTATTAGTATAAGATCAGAATTGAATTTAAGTTGTTG GTGACATCTGCATCATTTCATAGGATTAGAACTTACTCAAAATAATGTAAATCTTTAAAAATATAAATTA GAATGACAAGTGGGAATCATAAATTAAACGTTAATGGTTTCTTATGCTCTTTTTAAATATAGAAATATCA (897 letters) Sco +re E Sequences producing significant alignments: (Bi +ts) Value gi|89161216|ref|NC_000009.10|NC_000009 Homo sapiens chromosome... 173 +3 0E0 >gi|89161216|ref|NC_000009.10|NC_000009 Homo sapiens chromosome 9, ref +erence assembly, complete sequence Length = 140273252 Score = 1733 bits (877), Expect = 0E0 Identities = 892/897 (99%), Gaps = 1/897 (0%) Strand = Plus / Minus Query: 1 CAAGAAACTATATAGGTATACACTTACGACTTCACAAAACCTATACTTAATATA +GTAAAT 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||||| Sbjct: 94217254 CAAGAAACTATATAGGTATACACTTACGACTTCACAAAACCTATACTTAATATA +GTAAAT 94217195 Query: 61 CTAAGTAAACATGTATTACTCAAAGTAATATATTTAGAATTATGTATTAGTATA +AGATCA 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||||| Sbjct: 94217194 CTAAGTAAACATGTATTACTCAAAGTAATATATTTAGAATTATGTATTAGTATA +AGATCA 94217135 Query: 121 GAATTGAATTTAAGTTGTTGGTGACATCTGCATCATTTCATAGGATTAGAACTT +ACTCAA 180 |||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||||| Sbjct: 94217134 GAATTGAATTTAAGTTGTTGGTGACATCTGCATCATTTCATAGGATTAGAACTT +ACTCAA 94217075 Query: 181 AATAATGTAAATCTTTAAAAATATAAATTAGAATGACAAGTGGGAATCATAAAT +TAAACG 240 |||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||||| Sbjct: 94217074 AATAATGTAAATCTTTAAAAATATAAATTAGAATGACAAGTGGGAATCATAAAT +TAAACG 94217015 Query= 113 3'_UTR BP;A;C;G;T;OTHER TATTTTCTTATGTGGGTCTTATGCTTCCATTAACAAATGCTCTGTCTTCAATGATCAAATTTTGAGCAAA GAAACTTGTGCTTTACCAAGGGGAATTACTGAAAAAGGTGATTACTCCTGAAGTGAGTTTTACACGAACT GAAATGAGCATGCATTTTCTTGTATGATAGTGACTAGCACTAGACATGTCATGGTCCTCATGGTGCATAT AAATATATTTAACTTAACCCAGATTTTATTTATATCTTTATTCACCTTTTCTTCAAAATCGATATGGTGG CTGCAAAACTAGAATTGTTGCATCCCTCAATTGAATGAGGGCCATATCCCTGTGGTATTCCTTTCCTGCT TTGGGGCTTTAGAATTCTAATTGTCAGTGATTTTGTATATGAAAACAAGTTCCAAATCCACAGCTTTTAC
I want the output like this:-
Sbjct: 38348698 TTTATTTACAGACCCATAAAAATCCCATCAGTGTCGTCCCCTGCTGAGAGGCCA +TGTGAG 38348639 Sbjct: 94217074 AATAATGTAAATCTTTAAAAATATAAATTAGAATGACAAGTGGGAATCATAAAT +TAAACG 94217015
Thanks in advance
cowboy

Replies are listed 'Best First'.
Re: Parsing a file
by jwkrahn (Abbot) on Apr 16, 2009 at 06:52 UTC
    if($_ =~ /Sbjct:\s*(\d+)\D+(\d+)*$/ .. /\n\n\n\AQuery\=/)

    You are using the flip-flop operator so I will assume that there is a single line in $_.   If so then /\n\n\n\AQuery\=/ will never match because each line only has one newline and it is at the end of the line.   Also there appear to be only two newlines before the string 'Query=' and the anchor \A can only match at the beginning of a string so there can't be anything in front of it.

    You probably want something like this (UNTESTED):

    my $subject; while ( <> ) { $subject = $_ if /^Sbjct:\s*\d+\D+\d+$/; print $subject if /^Query=/; }
Re: Parsing a file
by Anonymous Monk on Apr 16, 2009 at 06:26 UTC
Re: Parsing a file
by citromatik (Curate) on Apr 16, 2009 at 06:40 UTC

    I don't know exactly what are you trying to accomplish, but if you only want to extract the "Sbjct" lines you could simply try something like:

    while (<$fh>) { print if (/Sbjct/){ }

    As a side note, I'm not a fan of bioperl at all, but maybe it would be useful for you. You can try This tutorial to see how to parse Blast reports.

    Also, it is always better to run blast with the option -m 8 and get the results in tabular format if your intention is to parse the results with a custom made script

    Hope this helps

    Update: The code has been updated to meet the poster output sample included

    citromatik

Re: Parsing a file
by 1Nf3 (Pilgrim) on Apr 16, 2009 at 06:31 UTC

    I'm not sure of your desired output. Are you trying to catch the last line containing Sbject? If you would post the desired output for the provided example input file, I'll try to find and post a solution.

    Anyway, it's very likely that the $_ variable you are parsing contains one line at a time, and searching for multiple \n doesn't work.

    Regards,
    Luke
      Hi.. I have updated my post with the desired output.Thanks

        If you want to extract Sbjct: lines, and print them out, the following should work fine:

        #!/usr/bin/perl -w use strict; while (<>) { print if /^Sbjct:\s*\d+\D+\d+$/; }
        After running this code for your sample input data i get:
        Sbjct: 38348818 AGGCATAAACCACATCCAGCCACCTCCTTCTGATCAGCAGCAAAGCTGACGTTT +TGATCT 38348759 Sbjct: 38348758 CCATCTGTCTGATTCTTGTGTCTACTTCTCAGTTTACAACTCCAGTGGGAAAGA +AAGAGC 38348699 Sbjct: 38348698 TTTATTTACAGACCCATAAAAATCCCATCAGTGTCGTCCCCTGCTGAGAGGCCA +TGTGAG 38348639 Sbjct: 94217254 CAAGAAACTATATAGGTATACACTTACGACTTCACAAAACCTATACTTAATATA +GTAAAT 94217195 Sbjct: 94217194 CTAAGTAAACATGTATTACTCAAAGTAATATATTTAGAATTATGTATTAGTATA +AGATCA 94217135 Sbjct: 94217134 GAATTGAATTTAAGTTGTTGGTGACATCTGCATCATTTCATAGGATTAGAACTT +ACTCAA 94217075 Sbjct: 94217074 AATAATGTAAATCTTTAAAAATATAAATTAGAATGACAAGTGGGAATCATAAAT +TAAACG 94217015

        Hope that helps,
        Luke

Re: Parsing a file
by Anonymous Monk on Apr 16, 2009 at 06:23 UTC
    I get
    Use of uninitialized value in pattern match (m//) at - line 1.
    I think $_ is unefined. Can you show code that actually demonstrates your problem?