Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hello all
i am using perl script to match two different pattern from same file and wish to print one after the other. but instead the result of only 2nd pattern is correct. the first pattern doesn't move from line1 of input file.
can anyone help me
#!/usr/bin/perl use strict; use warnings; my ($file1) = @ARGV; open(FILE, "<","$file1") or die "Can't Open File1\n"; my $pattern='^LOC'; my $pat='TAAAT'; while (my $line1 = <FILE>) { chomp($line1); #print "Line1:$line1\n"; if( $line1=~ /$pattern/) { while (my $line2 = <FILE>) { chomp($line2); #print "Line2:$line2\n"; if($line2 =~ /$pat/) { print "\nMatches: . $line1\n"; print "Matches: . $line2\n"; } else { print ""; } } } } close(FILE);

Replies are listed 'Best First'.
Re: pattern and loop
by choroba (Cardinal) on Jun 23, 2014 at 14:06 UTC
    By reading from <FILE> in the inner loop, you exhaust the file handle. It doesn't remember where it last read to $line1 and doesn't automagically review the file back. So, the outer loop only reads the very first line; in the next iteration, it sees the end of file.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: pattern and loop
by toolic (Bishop) on Jun 23, 2014 at 14:04 UTC

    Show a small sample of your:

    • input file
    • actual output
    • expected output
      Input file
      LOC_Os01g01320.1 : PS00022 EGF_1 EGF-like domain signature 1. 390 - 401 CaCtgCatGTgC + L=(-1) 1013 - 1024 CcCaaGgtGTtC + L=(-1) LOC_Os01g01320.1 : PS00099 THIOLASE_3 Thiolases active site. 976 - 989 AAGTACCAgAaAgA + L=(-1) 1269 - 1282 AAAGACGGtAaAtG + L=(-1) 1390 - 1403 GAAGACCAtAgAcA + L=(-1) LOC_Os01g01320.1 : PS00197 2FE2S_FER_1 2Fe-2S ferredoxin-type iron-sul +fur binding region signature. 1052 - 1060 CTGAACTTC + L=(-1) LOC_Os01g01320.1 : PS00269 DEFENSIN Mammalian defensins signature. 868 - 896 CaCgctg.CtgaccttGtCcaactacgaCC + L=(-1) LOC_Os01g01320.1 : PS00956 HYDROPHOBIN Fungal hydrophobins signature. 2248 - 2259 GAgCATTTaCCT + L=(-1) LOC_Os01g01320.1 : PS01177 ANAPHYLATOXIN_1 Anaphylatoxin domain signat +ure. 462 - 494 CCttGaacgcat.TCAaggaga...TatCtatggc.CC + L=(-1) 1786 - 1817 CCtgGcgttgattTCAacg......AaaCttactctCC + L=(-1) 2428 - 2461 CCatAtcttagt.GCTattggc...GcaCttatgtaCC + L=(-1) LOC_Os01g01320.1 : PS50214 DISINTEGRIN_2 Disintegrin domain profile. 340 - 425 GGTCATCATTTGATCACTGACAAATTTCCCaAATCCTCCGATTTCGtatgCACTG +CATGT L=-1 GCTACTGGGAaACTGATTTTGAGACC-------- 868 - 964 --------------CACGCTGctgaCCTTGTCCAactacgacCAACTGCATATCA +TGaaa L=-1 ctTCCCcaatGCAGTTAGTACGTGGAaatcttccaaATATTTCCCATTTGC-------- LOC_Os01g01320.1 : PS50842 EXPANSIN_EG45 Expansin, family-45 endogluca +nase-like domain profile. 1422 - 1494 CGATGAGTcTATaAGGGTTGATGAAATTGCCATCAACTACTCTGAAACTGGagaA +TCTTT L=-1 TGATAGaaGAGCT--------------------------------------------- 2712 - 2793 TGCCGCTTgTGTtgttcAGatggaAACTGGTTATATTAAGAGCAACATCACTaag +catat L=-1 tGCTCCTAAaTTATTTTATC-----CT--------------------------------- ------------ LOC_Os01g01320.1 : PS50868 POST_SET Post-SET domain profile. 1961 - 1977 ACATCCCGAATCAGAAG + L=-1 LOC_Os01g01320.1 : PS51173 CBM2 CBM2 (Carbohydrate-binding type-2) dom +ain signature and profile. 2593 - 2703 ATAATTGCATTATATGAGGCATCATGTGAATGTGTATGGCTTCGCAgaatGGTTA +ACCAC L=-1 ATATTAACATCTTGT--------GGTATTGGTTcATTGGAATcacCTACCATTATCTAT LOC_Os01g01320.1 : PS51364 TB TGF-beta binding (TB) domain profile.
      output i expect
      LOC_Os01g01320.1 : PS00099 THIOLASE_3 Thiolases active site. 1269 - 1282 AAAGACGGtAaAtG + L=(-1) LOC_Os01g01320.1 : PS50842 EXPANSIN_EG45 Expansin, family-45 endogluca +nase-like domain profile. tGCTCCTAAaTTATTTTATC-----CT---------------------------------
      output i get
      LOC_Os01g01320.1 : PS00022 EGF_1 EGF-like domain signature 1. 1269 - 1282 AAAGACGGtAaAtG + L=(-1) LOC_Os01g01320.1 : PS00022 EGF_1 EGF-like domain signature 1. tGCTCCTAAaTTATTTTATC-----CT---------------------------------
      becoz its not reading from 2nd line of $line1 it only printing first line
      $pattern= ^LOC_Os0[1-7]g[0-9]*.[0-9]\s;
Re: pattern and loop
by 2teez (Vicar) on Jun 23, 2014 at 19:34 UTC

    Hi,
    Using the file and the expected output you showed. Something like this could help:

    use warnings; use strict; my $line_required; while (<DATA>) { $line_required = $_ if /^LOC/; if (/tAaAt/i) { print $line_required, $_; } } __DATA__ LOC_Os01g01320.1 : PS00022 EGF_1 EGF-like domain signature 1. 390 - 401 CaCtgCatGTgC + L=(-1) 1013 - 1024 CcCaaGgtGTtC + L=(-1) LOC_Os01g01320.1 : PS00099 THIOLASE_3 Thiolases active site. 976 - 989 AAGTACCAgAaAgA + L=(-1) 1269 - 1282 AAAGACGGtAaAtG + L=(-1) 1390 - 1403 GAAGACCAtAgAcA + L=(-1) LOC_Os01g01320.1 : PS00197 2FE2S_FER_1 2Fe-2S ferredoxin-type iron-sul +fur binding region signature. 1052 - 1060 CTGAACTTC + L=(-1) LOC_Os01g01320.1 : PS00269 DEFENSIN Mammalian defensins signature. 868 - 896 CaCgctg.CtgaccttGtCcaactacgaCC + L=(-1) LOC_Os01g01320.1 : PS00956 HYDROPHOBIN Fungal hydrophobins signature. 2248 - 2259 GAgCATTTaCCT + L=(-1) LOC_Os01g01320.1 : PS01177 ANAPHYLATOXIN_1 Anaphylatoxin domain signat +ure. 462 - 494 CCttGaacgcat.TCAaggaga...TatCtatggc.CC + L=(-1) 1786 - 1817 CCtgGcgttgattTCAacg......AaaCttactctCC + L=(-1) 2428 - 2461 CCatAtcttagt.GCTattggc...GcaCttatgtaCC + L=(-1) LOC_Os01g01320.1 : PS50214 DISINTEGRIN_2 Disintegrin domain profile. 340 - 425 GGTCATCATTTGATCACTGACAAATTTCCCaAATCCTCCGATTTCGtatgCACTG +CATGT L=-1 GCTACTGGGAaACTGATTTTGAGACC-------- 868 - 964 --------------CACGCTGctgaCCTTGTCCAactacgacCAACTGCATATCA +TGaaa L=-1 ctTCCCcaatGCAGTTAGTACGTGGAaatcttccaaATATTTCCCATTTGC-------- LOC_Os01g01320.1 : PS50842 EXPANSIN_EG45 Expansin, family-45 endogluca +nase-like domain profile. 1422 - 1494 CGATGAGTcTATaAGGGTTGATGAAATTGCCATCAACTACTCTGAAACTGGagaA +TCTTT L=-1 TGATAGaaGAGCT--------------------------------------------- 2712 - 2793 TGCCGCTTgTGTtgttcAGatggaAACTGGTTATATTAAGAGCAACATCACTaag +catat L=-1 tGCTCCTAAaTTATTTTATC-----CT--------------------------------- ------------ LOC_Os01g01320.1 : PS50868 POST_SET Post-SET domain profile. 1961 - 1977 ACATCCCGAATCAGAAG + L=-1 LOC_Os01g01320.1 : PS51173 CBM2 CBM2 (Carbohydrate-binding type-2) dom +ain signature and profile. 2593 - 2703 ATAATTGCATTATATGAGGCATCATGTGAATGTGTATGGCTTCGCAgaatGGTTA +ACCAC L=-1 ATATTAACATCTTGT--------GGTATTGGTTcATTGGAATcacCTACCATTATCTAT LOC_Os01g01320.1 : PS51364 TB TGF-beta binding (TB) domain profile.

    OR even a one-liner like so:
    perl -wne '$l = $_ if/^LOC/; print $l,$_ if/tAaAt/i' log_file_you_use. +txt
    OUTPUT
    LOC_Os01g01320.1 : PS00099 THIOLASE_3 Thiolases active site. 1269 - 1282 AAAGACGGtAaAtG + L=(-1) LOC_Os01g01320.1 : PS50842 EXPANSIN_EG45 Expansin, family-45 endogluca +nase-like domain profile. tGCTCCTAAaTTATTTTATC-----CT---------------------------------
    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: pattern and loop
by AppleFritter (Vicar) on Jun 23, 2014 at 18:37 UTC

    You've got two nested while loops, both reading from the same filehandle. That may not be what you want.

    Here's what'll happen: the outer loop will read lines from the file. As soon as a match is found (if ($line1 =~ /$pattern/)), the second, inner loop will start and continue reading from where the first loop left off. When it's done, the filehandle is exhausted, and the outer loop, noticing that there's no data left to read from the file, will terminate.

    Note that

    1. the inner loop will not start reading from the start of the file, and
    2. the outer loop will not pick up where you left off before the inner loop started again.

    Either of these may well be intended, expected behavior, but I'm mentioning it just in case.

    If it's not intended/expected, you can use tell and seek to save the filehandle's current position and restore it, though there'll likely be easier, more elegant ways of achieving what you want to achieve.

Re: pattern and loop
by vinoth.ree (Monsignor) on Jun 23, 2014 at 14:08 UTC

    Hi,

    Sorry, can not get your question. Please share us the sample input and output file so that we can understand your requirement.


    All is well
Re: pattern and loop
by Anonymous Monk on Jun 23, 2014 at 14:47 UTC
    $pattern= ^LOC_Os0[1-7]g[0-9]*.[0-9]\s;