Where am I going wrong?

lairel has asked for the wisdom of the Perl Monks concerning the following question:

I am very new to perl, and have been trying to get this code right, but I am completely lost. I have tried a few variations but have no idea where I am going wrong or what I need to do. I currently get no error messages of any sort. The program is supposed ot read a fasta file, searching for the sequence VILMFWCA, and the output is the header that contains that sequence and then the sequence and location. Here is what I have

#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;

unless (open (INFILE, "<", "/scratch/SampleDataFiles/test.fasta")){
        die "Unable to open file", $!;
}

local $/ = ">";
#find and print desired sequence

while (<INFILE>) {

        chomp; #always chomp

                if ( $_ =~ /^(.*?)$(.*)$/ms ) { #match first line as h
+eader

                        my $header = $1; #assign the parts of the matc
+h

                                my $seq = $2;

                        $seq =~ s/\n//g; #get rid of whitespace


                                while($seq =~ /([VILMFWCA]{8,})/g){ #s
+earch for desired sequence

                                        my $location = pos($seq); #fin
+d location

                                                my $length = length($1
+); #determine length

                                                print "Hydrophobic str
+etch found in: ", $header, "\n"; #printing outputs for results

                                                print $1, "\n";
                                        print "The match was at positi
+on: ", $location - $length + 1, "\n\n";

                                }



                }
}
close INFILE;
[download]

updates with where I am at now, but now I am getting too much output, every time it finds the stretch it prints the header instead of once per header, I tried moving the print command up but it didn't do it. The output is supposed to look like this:

Hydrophobic stretch found in: P30450 | Homo sapiens (Human). | NCBI_Ta
+xID=9606; | 365 | Name=HLA-A; Synonyms=HLAA;

 

AVVAAVMW

The match was at position: 325

 

Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta
+xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742;

 

VAVLMLCLAVIFLC

The match was at position: 170

 

LLALVAIFF

The match was at position: 493

 

IWICWFAALAA

The match was at position: 705

 

LALALAFA

The match was at position: 970
[download]

but my current output is like this:

Hydrophobic stretch found in: P30450 | Homo sapiens (Human). | NCBI_Ta
+xID=9606; | 365 |    Name=HLA-A; Synonyms=HLAA;
AVVAAVMW
The match was at position: 325

Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta
+xID=9606; | 1401 |    Name=DISP2; Synonyms=DISPB, KIAA1742;
VAVLMLCLAVIFLC
The match was at position: 170

Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta
+xID=9606; | 1401 |    Name=DISP2; Synonyms=DISPB, KIAA1742;
LLALVAIFF
The match was at position: 493

Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta
+xID=9606; | 1401 |    Name=DISP2; Synonyms=DISPB, KIAA1742;
IWICWFAALAA
The match was at position: 705

Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta
+xID=9606; | 1401 |    Name=DISP2; Synonyms=DISPB, KIAA1742;
LALALAFA
The match was at position: 970
[download]

I'm so close

Comment on Where am I going wrong? Select or Download Code

Replies are listed 'Best First'.
Re: Where am I going wrong? by NetWallah (Canon) on Oct 20, 2015 at 16:51 UTC
According to wikipedia, There should be no space between the ">" and the first letter of the identifier whereas your match string requires a \t. It also uses unnecessary capturing parens. Try this: `if ($sequence =~/^>VILMFWCA/){ ...` [download] Or paste a small relevant sample of your data. The best defense against logic is ignorance.	[reply] [d/l]
Re: Where am I going wrong? by toolic (Bishop) on Oct 20, 2015 at 15:51 UTC
If your code does not display the "Hydrophobic stretch found in:" message, then the input file could exist, but be empty. Or, the input file does not have a line beginning with >, then followed by a tab, then followed by VILMFWCA. Is there a tab? You could try: `if ($sequence =~ /(^>)\s*(VILMFWCA)/){` [download] Is this a job for http://www.bioperl.org/wiki/Main_Page? See also: How do I compose an effective node title?	[reply] [d/l]
Re^2: Where am I going wrong? by lairel (Novice) on Oct 20, 2015 at 16:27 UTC
I know the file isn't empty, I initially wrote the program to just print the file so I could see what it looked like. I made the change you suggested and didn't get any output still. The fasta file has header line then sequence line for each sequence, and I need the program to search the sequence for the desired code, then if the sequence is found, print the corresponding header, and then the sequence and location of the sequence	[reply]
Re: Where am I going wrong? by choroba (Cardinal) on Oct 20, 2015 at 16:48 UTC
Please, show us a sample input. The code (after some tweaking) seems to produce something with files like the following: `> VILMFWCAxxxx GACTACTACTACTACTACTACTACTACTACTACTACTACTACTACTACTACTACG` [download] Note that using `while ($string =~ /.../)` doesn't make much sense if you don't change the $string - it either doesn't match, or it matches forever. Maybe you wanted to add a `/g` at the end? Also, `(^>)` seems like a typo, did you want `[^>]` instead? لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]