lairel has asked for the wisdom of the Perl Monks concerning the following question:

I have the majority of my code running properly, but I am getting repeats of the header for each corresponding output instead of one iteration of the header for all corresponding output. I don't think I am explaining that well, so I have included my current output with the desired output. My code

#!/usr/bin/perl use strict; use warnings; use diagnostics; unless (open (INFILE, "<", "/scratch/SampleDataFiles/test.fasta")){ die "Unable to open file", $!; } local $/ = ">"; #find and print desired sequence while (<INFILE>) { chomp; #always chomp if ( $_ =~ /^(.*?)$(.*)$/ms ) { #match first line as h +eader my $header = $1; #assign the parts of the matc +h my $seq = $2; $seq =~ s/\n//g; #get rid of whitespace while($seq =~ /([VILMFWCA]{8,})/g){ #s +earch for desired sequence my $location = pos($seq); #fin +d location my $length = length($1 +); #determine length print "Hydrophobic str +etch found in: ", $header, "\n"; #printing outputs for results print $1, "\n"; print "The match was at positi +on: ", $location - $length + 1, "\n\n"; } } } close INFILE;

The output is supposed to look like this:

Hydrophobic stretch found in: P30450 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 365 | Name=HLA-A; Synonyms=HLAA; AVVAAVMW The match was at position: 325 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; VAVLMLCLAVIFLC The match was at position: 170 LLALVAIFF The match was at position: 493 IWICWFAALAA The match was at position: 705 LALALAFA The match was at position: 970

but my current output is like this:

Hydrophobic stretch found in: P30450 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 365 | Name=HLA-A; Synonyms=HLAA; AVVAAVMW The match was at position: 325 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; VAVLMLCLAVIFLC The match was at position: 170 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; LLALVAIFF The match was at position: 493 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; IWICWFAALAA The match was at position: 705 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; LALALAFA The match was at position: 970

I'm so close

Replies are listed 'Best First'.
Re: One header for multiple results
by AppleFritter (Vicar) on Oct 21, 2015 at 14:51 UTC

    Well, you ARE outputting the "Hydrophobic stretch found" line each time there is a match:

    while($seq =~ /([VILMFWCA]{8,})/g){ #search for desired sequence my $location = pos($seq); #find location my $length = length($1); #determine length print "Hydrophobic stretch found in: ", $header, "\n"; #printing o +utputs for results print $1, "\n"; print "The match was at position: ", $location - $length + 1, "\n\ +n"; }

    If you only want to print it once, the easiest way is to introduce a flag indicating whether it's already been output. Declare a new variable in front of your while loop there, and then check and set it inside the loop body:

    my $header_printed = 0; # header for this thingamabob was not yet prin +ted while($seq =~ /([VILMFWCA]{8,})/g){ #search for desired sequence my $location = pos($seq); #find location my $length = length($1); #determine length unless($header_printed) { print "Hydrophobic stretch found in: ", $header, "\n"; #printi +ng outputs for results $header_printed = 1; } print $1, "\n"; print "The match was at position: ", $location - $length + 1, "\n\ +n"; }

    Obviously this could be written more concisely, but this way it's clear what's going on.

    Does this help you?

      Thank you! It did exactly what I wanted.
Re: One header for multiple results
by AnomalousMonk (Archbishop) on Oct 21, 2015 at 15:00 UTC
Re: One header for multiple results
by Laurent_R (Canon) on Oct 21, 2015 at 14:40 UTC
    It would be much easier to answer if you provided a sample of your input data, as it would help finding out where your code is wrong. Without that, we can only make wild guesses or shots in the dark.

    I would also suggest that you use indentation consistently, it would probably help yourself finding out what you're doing wrong.

Re: One header for multiple results
by GotToBTru (Prior) on Oct 21, 2015 at 14:58 UTC

    You have the line that prints the header in your inner while loop. Of course you get it with every sequence. Move that statement to before your loop, and you will see it only once.

    Dum Spiro Spero
      I had tried that, but then it would bunch the headers before the results. I appreciate the suggestion!

        Not if you did what I suggested ... but I see in the end you have an answer.

        Dum Spiro Spero