FarTech has asked for the wisdom of the Perl Monks concerning the following question:

Hi i have a problem running this code.

In the following code,i call fasta34.exe which ask u to enter some values such as sequence file name, library file name etc(all this file should be stored in perl folder).It generates a output file with this input.

Parse the output file and get the sequence which satisfies the threshold value.

i m trying to store each sequences in an array with one index value, since the sequences contains multiple lines it stores only the last line of the sequences.

my main problem here is : to store each sequence in a temporary file and give it as a input to the fasta34.exe by recursively calling it and automate all the input values to fasta34.exe, since each time the same values are entered

can anybody help me in modifying the code.

Thanks Farid

******************************
#!usr/bin/perl -w print"\n********Running Fasta34********** \n \n"; print"\n Please enter the file sequence and library present in your pa +th \n\n"; $result = system("c:/perl/sam/fasta34.exe"); print"\n\n*****enter the output file name you have given***\n"; $output = <STDIN>; open(FASTA,"c:/perl/sam/$output") or die "cant open the output file \n +"; #Make this a function #Function1 #open(FASTA,"c:/perl/sam/$output"); #@ tempseq = <FASTA>; #$flag=0; my $var; my $name; my $match; my $seqname; my @input; my $iteration; my $x = 0; print "Enter your cut off percentage"; $cut = <STDIN>; while (<FASTA>) { #compare the line of matching sequence if($name = m/>>(.{4,6})(.*)/g) { $seqname = $1; $laterhalf = $2; } #print "SKIP A LINE iF A MATCH \n"; next if /^ini/; if ($per = m/(\d+\.\d+)% identity/g) { $per = $1; #check if match is above the cutoff percentage if ($per > $cut) { print "\n\n$seqname$laterhalf \n";\ print "identity match $per % \n"; #store the first line of the input file $input[$x] = "$seqname$laterhalf"; while(<FASTA>) { if ($match = m/^($seqname)(.*)/) { #store the match sequence print "$2 \n"; $input[$x] = $2; } if(m/>>/g) { print @input[$x]; #open(OUT2IN,">c:/perl/sam/output1"); #print OUT2IN @input; #$iteration = system("OUT2IN"); $x = $x + 1; seek(FASTA,-100,1); last; } } } } } close (FASTA); print $result;
********************************

sample fasta output file

**********************************

# c:\perl\sam\fasta34.exe FASTA searches a protein or DNA sequence data bank version 3.4t21 May 14, 2003 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 hahu.aa, 141 aa vs mu.lib library 1776 residues in 7 sequences Altschul/Gish params: n0: 141 Lambda: 0.158 K: 0.019 H: 0.100 FASTA (3.45an0 Mar 2002) function [optimized, BL50 matrix (15:-5)] ktu +p: 2 join: 36, opt: 24, open/ext: -10/-2, width: 16 The best scores are: opt bits E( +7) GTM1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2. ( 217) 51 18.7 + 0.48 GTM1_RAT GLUTATHIONE S-TRANSFERASE YB1 (EC 2.5.1. ( 217) 42 16.7 + 1.8 GTMU_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1 ( 217) 41 16.4 + 2 GTMU_RABIT GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5 ( 217) 37 15.5 + 3.4 GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5 ( 217) 35 15.1 + 4.1 GLNA_ANASP GLUTAMINE SYNTHETASE (EC 6.3.1.2) (GLU ( 473) 35 15.0 + 6.1 GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MU 4 (EC 2.5 ( 218) 28 13.5 + 6.5 >>GTM1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.5.1.1 (217 aa) initn: 40 init1: 40 opt: 51 Z-score: 66.1 bits: 18.7 E(): 0.48 Smith-Waterman score: 51; 25.641% identity (27.027% ungapped) in 39 a +a overlap (35-72:176-213) 10 20 30 40 50 60 HAHU ADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFD-LSHGSAQVKGHGKKVA .::. . .. .:. :.. :: .:. .. .: GTM1_M WFAGDKVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKS-SRYIA 150 160 170 180 190 200 70 80 90 100 110 120 HAHU DALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA . . .:: GTM1_M TPIFSKMAHWSNK 210 >>GTM1_RAT GLUTATHIONE S-TRANSFERASE YB1 (EC 2.5.1.18) ( (217 aa) initn: 36 init1: 36 opt: 42 Z-score: 55.0 bits: 16.7 E(): 1.8 Smith-Waterman score: 42; 34.783% identity (36.364% ungapped) in 23 a +a overlap (35-56:176-198) 10 20 30 40 50 60 HAHU ADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFD-LSHGSAQVKGHGKKVA .::. : .. .:. :.. :: .:. GTM1_R WFAGDKVTYVDFLAYDILDQYHIFEPKCLDAFPNLKDFLARFEGLKKISAYMKSSRYLST 150 160 170 180 190 200 70 80 90 100 110 120 HAHU DALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA GTM1_R PIFSKLAQWSNK 210 >>GTMU_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1.18) (217 aa) initn: 36 init1: 36 opt: 41 Z-score: 53.8 bits: 16.4 E(): 2 Smith-Waterman score: 41; 36.364% identity (38.095% ungapped) in 22 a +a overlap (36-56:177-198) 10 20 30 40 50 60 HAHU DKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFD-LSHGSAQVKGHGKKVAD ::. : .. .:. :.. :: .: GTMU_C FAGDKVTLCGFLAYDVLDQYQMFEPKCLDPFPNLKDFLARFEGLKKISAYMKTSRFLRRP 150 160 170 180 190 200 70 80 90 100 110 120 HAHU ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS GTMU_C IFSKMAQWSNK 210 >>GTMU_RABIT GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18 (217 aa) initn: 32 init1: 32 opt: 37 Z-score: 48.8 bits: 15.5 E(): 3.4 Smith-Waterman score: 44; 29.508% identity (33.962% ungapped) in 61 a +a overlap (31-91:17-69) 10 20 30 40 50 60 HAHU VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK ::.: . : : . . . :.: ..: GTMU_R PMTLGYWDVRGLALPIRMLLEY--TDTSYEEKKYTMGDAPNYDQSK 10 20 30 40 70 80 90 100 110 120 HAHU KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA ... .: .. :.:: : : : .::: GTMU_R WLSEKFTLGL----DFPN-LPYLID-GTHKLTQSNAILRYLARKHGLCGETEEERIRVDI 50 60 70 80 90 130 140 HAHU VHASLDKFLASVSTVLTSKYR GTMU_R LENQLMDNRFQLVNVCYSPDFEKLKPEYLKGLPEKLQLYSQFLGSLPWFAGDKITFADFL 100 110 120 130 140 150 >>GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18 (217 aa) initn: 28 init1: 28 opt: 35 Z-score: 46.4 bits: 15.1 E(): 4.1 Smith-Waterman score: 41; 30.769% identity (36.364% ungapped) in 39 a +a overlap (24-56:160-198) 10 20 30 40 HAHU VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFL-----SFPTTKTYFPHFD- : . :.:.: .::. : .. .:. GTM1_H LPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEG 130 140 150 160 170 180 50 60 70 80 90 100 HAHU LSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLV : . :: .:. GTM1_H LEKISAYMKSSRFLPRPVFSKMAVWGNK 190 200 210 >>GLNA_ANASP GLUTAMINE SYNTHETASE (EC 6.3.1.2) (GLUTAMAT (473 aa) initn: 31 init1: 31 opt: 35 Z-score: 39.7 bits: 15.0 E(): 6.1 Smith-Waterman score: 38; 30.435% identity (31.818% ungapped) in 23 a +a overlap (96-118:447-468) 70 80 90 100 110 120 HAHU LTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASL ...:: .. . : : : ::. GLNA_A SLELALEALENDHAFLTDTGVFTEDFIQNWIDYKLANEVKQMQLRPH-PYEFSIYYDV 420 430 440 450 460 470 130 140 HAHU DKFLASVSTVLTSKYR >>GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MU 4 (EC 2.5.1.18 (218 aa) initn: 28 init1: 28 opt: 28 Z-score: 37.7 bits: 13.5 E(): 6.5 Smith-Waterman score: 41; 30.769% identity (36.364% ungapped) in 39 a +a overlap (24-56:161-199) 10 20 30 40 HAHU VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFL-----SFPTTKTYFPHFD- : . :.:.: .::. : .. .:. GTM4_H LPTMMQHFSQFLGKRPWFVGDKITFVDFLAYDVLDLHRIFEPNCLDAFPNLKDFISRFEG 140 150 160 170 180 190 50 60 70 80 90 100 HAHU LSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLV : . :: .:. GTM4_H LEKISAYMKSSRFLPKPLYTRVAVWGNK 200 210 141 residues in 1 query sequences 1776 residues in 7 library sequences Scomplib [34t21] start: Tue Jan 18 16:24:00 2005 done: Tue Jan 18 16:24:33 2005 Scan time: 0.000 Display time: 14.000 Function used was FASTA [version 3.4t21 May 14, 2003]
********************************

output of the code ***********************************

*****enter the output file name you have given*** g.fasta Enter your cut off percentage30.00 GTM1_RAT GLUTATHIONE S-TRANSFERASE YB1 (EC 2.5.1.18) ( (217 aa) identity match 34.783 % WFAGDKVTYVDFLAYDILDQYHIFEPKCLDAFPNLKDFLARFEGLKKISAYMKSSRYLST PIFSKLAQWSNK PIFSKLAQWSNK GTMU_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1.18) (217 aa) identity match 36.364 % FAGDKVTLCGFLAYDVLDQYQMFEPKCLDPFPNLKDFLARFEGLKKISAYMKTSRFLRRP IFSKMAQWSNK IFSKMAQWSNK GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18 (217 aa) identity match 30.769 % LPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEG LEKISAYMKSSRFLPRPVFSKMAVWGNK LEKISAYMKSSRFLPRPVFSKMAVWGNK GLNA_ANASP GLUTAMINE SYNTHETASE (EC 6.3.1.2) (GLUTAMAT (473 aa) identity match 30.435 % SLELALEALENDHAFLTDTGVFTEDFIQNWIDYKLANEVKQMQLRPH-PYEFSIYYDV SLELALEALENDHAFLTDTGVFTEDFIQNWIDYKLANEVKQMQLRPH-PYEFSIYYDV GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MU 4 (EC 2.5.1.18 (218 aa) identity match 30.769 % LPTMMQHFSQFLGKRPWFVGDKITFVDFLAYDVLDLHRIFEPNCLDAFPNLKDFISRFEG LEKISAYMKSSRFLPKPLYTRVAVWGNK 0

Edit by castaway - added all sorts of formatting tags

Replies are listed 'Best First'.
Re: Automating the input to exe file
by renz (Scribe) on Jan 26, 2005 at 19:22 UTC

    Hi. I have a problem reading your code. If you could be so kind as to put it between code tags, I would be more inclined to assist.

    If you did not notice the information in the bulleted list below the preview and create buttons, please read and endeavor to conform to those directions.

    /renz.
    "The term clinical depression finds its way into too many conversations these days. One has a sense that a catastrophe has occurred in the psychic landscape." --Leonard Cohen.
Re: Automating the input to exe file
by ikegami (Patriarch) on Jan 26, 2005 at 20:34 UTC

    I don't have time to figure this out right now, but I spotted some "harmless" problems:

    print @input[$x]; should be print $input[$x]; for performance and readability reasons.

    Why do you use $per in if ($per = m/(\d+\.\d+)% identity/g)? You promptly overwrite the value.
    if ($per = m/(\d+\.\d+)% identity/g)
    boils down to
    if (m/(\d+\.\d+)% identity/g)
    which in turn reduces to
    if (m/(\d+\.\d+)% identity/)

    Why do you use $name in if ($name = m/>>(.{4,6})(.*)/g)? You never use it anywhere.
    if ($name = m/>>(.{4,6})(.*)/g)
    boils down to
    if (m/>>(.{4,6})(.*)/g)
    which in turn reduces to
    if (m/>>(.{4,6})(.*)/)

    Why do you use $match in if ($match = m/^($seqname)(.*)/)? You never use it anywhere.
    if ($match = m/^($seqname)(.*)/)
    boils down to
    if (m/^($seqname)(.*)/)

    There's no use for the g modifier in if(m/>>/g).

    It would be a good idea to check if system returned an error.

    While these changes don't have any visible effect on the code's execution, they will make your code easier to read. Anyone reading the code as is will likely lose confidence in you.

Command line option + Fasta
by FarTech (Novice) on Jan 27, 2005 at 01:24 UTC
    Hi guys.

    Thanks for all the people who helped me

    I am trying to find the exhaustive homologous match of a given sequence against a given library.

    I run the fasta34.exe from my perl script,input the sequence & library file name which is stored in the perl folder and other input values .i got a output in the fasta format, which I parse thru and get the best sequence match above a given threshold value and store the match each at a time in a file. Now use this extracted sequence file to run the fasta again against the same library and inputs and try to call the fasta34 again and repeat the same procedure till u get the exhaustive homologous match. If u find any new sequence comparded to the original one, append to the original fasta file.

    Using my code I am able to extract the sequence from the orginal file which fits the criteria above a threshold using the system command in the perl with the command line i can avoid reentring the inputs

    The main problem the system command with the command line is not storing in a file.

    has anybody experienced in using the command line in fasta as well as comparing the original file with new file generated and appending the new contents to the original file

Re: Automating the input to exe file
by Solo (Deacon) on Jan 26, 2005 at 20:25 UTC
    I think this problem can be simplified by thinking about the file as multi-line records. Change your record separator to '>>' and process only the records meeting your threshold. Be sure to include the 'm' at the end of the regex for multi-line matching.

    my $threshold = <STDIN>; # or so... my @records; { local $/ = ">>"; while (<FASTA>) { if ( /(\d+\.\d+)% identity/m ) { push @records, $_ if $1 > $threshold; # - or - # process($_); } } } # @records now holds lines meeting threshold # process as needed... # - or - # sub process { ... }

    --Solo

    --
    You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
Re: Automating the input to exe file
by injunjoel (Priest) on Jan 26, 2005 at 20:40 UTC
    Greetings all,
    Some documentation might be in order. Try reading up on perlsub to learn about how perl works with subroutines/functions. Also a quick note. .. If you are attempting to match something with regexp you need to use the $var =~ m/regexpstuffhere/ format simply using = will not do what you are hoping.

    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
      That's what I thought at first, but he is matching against $_ as intended. However, he is storing the number of matches in variables he never uses, and uses the /g modifier needlessly.