lairel has asked for the wisdom of the Perl Monks concerning the following question:

I am very new to perl, and have been trying to get this code right, but I am completely lost. I have tried a few variations but have no idea where I am going wrong or what I need to do. I currently get no error messages of any sort. The program is supposed ot read a fasta file, searching for the sequence VILMFWCA, and the output is the header that contains that sequence and then the sequence and location. Here is what I have

#!/usr/bin/perl use strict; use warnings; use diagnostics; unless (open (INFILE, "<", "/scratch/SampleDataFiles/test.fasta")){ die "Unable to open file", $!; } local $/ = ">"; #find and print desired sequence while (<INFILE>) { chomp; #always chomp if ( $_ =~ /^(.*?)$(.*)$/ms ) { #match first line as h +eader my $header = $1; #assign the parts of the matc +h my $seq = $2; $seq =~ s/\n//g; #get rid of whitespace while($seq =~ /([VILMFWCA]{8,})/g){ #s +earch for desired sequence my $location = pos($seq); #fin +d location my $length = length($1 +); #determine length print "Hydrophobic str +etch found in: ", $header, "\n"; #printing outputs for results print $1, "\n"; print "The match was at positi +on: ", $location - $length + 1, "\n\n"; } } } close INFILE;

updates with where I am at now, but now I am getting too much output, every time it finds the stretch it prints the header instead of once per header, I tried moving the print command up but it didn't do it. The output is supposed to look like this:

Hydrophobic stretch found in: P30450 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 365 | Name=HLA-A; Synonyms=HLAA; AVVAAVMW The match was at position: 325 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; VAVLMLCLAVIFLC The match was at position: 170 LLALVAIFF The match was at position: 493 IWICWFAALAA The match was at position: 705 LALALAFA The match was at position: 970

but my current output is like this:

Hydrophobic stretch found in: P30450 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 365 | Name=HLA-A; Synonyms=HLAA; AVVAAVMW The match was at position: 325 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; VAVLMLCLAVIFLC The match was at position: 170 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; LLALVAIFF The match was at position: 493 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; IWICWFAALAA The match was at position: 705 Hydrophobic stretch found in: A7MBM2 | Homo sapiens (Human). | NCBI_Ta +xID=9606; | 1401 | Name=DISP2; Synonyms=DISPB, KIAA1742; LALALAFA The match was at position: 970

I'm so close

Replies are listed 'Best First'.
Re: Where am I going wrong?
by NetWallah (Canon) on Oct 20, 2015 at 16:51 UTC
    According to wikipedia,

    There should be no space between the ">" and the first letter of the identifier

    whereas your match string requires a \t.

    It also uses unnecessary capturing parens.

    Try this:

    if ($sequence =~/^>VILMFWCA/){ ...
    Or paste a small relevant sample of your data.

            The best defense against logic is ignorance.

Re: Where am I going wrong?
by toolic (Bishop) on Oct 20, 2015 at 15:51 UTC

      I know the file isn't empty, I initially wrote the program to just print the file so I could see what it looked like. I made the change you suggested and didn't get any output still. The fasta file has header line then sequence line for each sequence, and I need the program to search the sequence for the desired code, then if the sequence is found, print the corresponding header, and then the sequence and location of the sequence

Re: Where am I going wrong?
by choroba (Cardinal) on Oct 20, 2015 at 16:48 UTC
    Please, show us a sample input. The code (after some tweaking) seems to produce something with files like the following:
    > VILMFWCAxxxx GACTACTACTACTACTACTACTACTACTACTACTACTACTACTACTACTACTACG

    Note that using while ($string =~ /.../) doesn't make much sense if you don't change the $string - it either doesn't match, or it matches forever. Maybe you wanted to add a /g at the end? Also, (^>) seems like a typo, did you want [^>] instead?

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ