Automating the input to exe file

FarTech has asked for the wisdom of the Perl Monks concerning the following question:

Hi i have a problem running this code.

In the following code,i call fasta34.exe which ask u to enter some values such as sequence file name, library file name etc(all this file should be stored in perl folder).It generates a output file with this input.

Parse the output file and get the sequence which satisfies the threshold value.

i m trying to store each sequences in an array with one index value, since the sequences contains multiple lines it stores only the last line of the sequences.

my main problem here is : to store each sequence in a temporary file and give it as a input to the fasta34.exe by recursively calling it and automate all the input values to fasta34.exe, since each time the same values are entered

can anybody help me in modifying the code.

Thanks Farid

******************************

#!usr/bin/perl -w

print"\n********Running Fasta34**********  \n \n";
print"\n Please enter the file sequence and library present in your pa
+th \n\n";
$result = system("c:/perl/sam/fasta34.exe");
print"\n\n*****enter the output file name you have given***\n";
$output = <STDIN>;
open(FASTA,"c:/perl/sam/$output") or die "cant open the output file \n
+";


#Make this a function 

#Function1 

#open(FASTA,"c:/perl/sam/$output");
#@ tempseq = <FASTA>;
#$flag=0;
my $var;
my $name;
my $match;
my $seqname;
my @input;
my $iteration;
my $x = 0;
print "Enter your cut off percentage";
$cut = <STDIN>;
while (<FASTA>)
{
    #compare the line of matching sequence 
    if($name = m/>>(.{4,6})(.*)/g)
    {
        $seqname = $1;
        $laterhalf = $2;
    }
    #print "SKIP A LINE iF A MATCH \n";
    next if /^ini/;
    if ($per = m/(\d+\.\d+)% identity/g) 
    {
        $per = $1;
        #check if match is above the cutoff percentage  
        if ($per > $cut)
        {
            print "\n\n$seqname$laterhalf \n";\
            print "identity match $per % \n";
            #store the first line of the input file 
            $input[$x] = "$seqname$laterhalf";
            while(<FASTA>)
            {
                if ($match = m/^($seqname)(.*)/)
                {
                    #store the match sequence
                    print "$2 \n";
                    $input[$x] = $2;        
                }
                if(m/>>/g)
                {
                    print @input[$x];
                    #open(OUT2IN,">c:/perl/sam/output1");
                    #print OUT2IN @input;
                    #$iteration = system("OUT2IN");
                    $x = $x + 1;
                    seek(FASTA,-100,1);
                    last;
                }
            }
        }
    }
}
close (FASTA);
print $result;
[download]

********************************

sample fasta output file

**********************************

# c:\perl\sam\fasta34.exe
 FASTA searches a protein or DNA sequence data bank
 version 3.4t21 May 14, 2003
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

 hahu.aa, 141 aa
 vs mu.lib library

   1776 residues in     7 sequences
 Altschul/Gish params: n0: 141 Lambda: 0.158 K: 0.019 H: 0.100

FASTA (3.45an0 Mar 2002) function [optimized, BL50 matrix (15:-5)] ktu
+p: 2
 join: 36, opt: 24, open/ext: -10/-2, width:  16
The best scores are:                                       opt bits E(
+7)
GTM1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.  ( 217)   51  18.7  
+  0.48
GTM1_RAT GLUTATHIONE S-TRANSFERASE YB1 (EC 2.5.1.  ( 217)   42  16.7  
+   1.8
GTMU_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1  ( 217)   41  16.4  
+     2
GTMU_RABIT GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5  ( 217)   37  15.5  
+   3.4
GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5  ( 217)   35  15.1  
+   4.1
GLNA_ANASP GLUTAMINE SYNTHETASE (EC 6.3.1.2) (GLU  ( 473)   35  15.0  
+   6.1
GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MU 4 (EC 2.5  ( 218)   28  13.5  
+   6.5


>>GTM1_MOUSE GLUTATHIONE S-TRANSFERASE GT8.7 (EC 2.5.1.1  (217 aa)
 initn:  40 init1:  40 opt:  51  Z-score: 66.1  bits: 18.7 E(): 0.48
Smith-Waterman score: 51;  25.641% identity (27.027% ungapped) in 39 a
+a overlap (35-72:176-213)

           10        20        30        40         50        60   
HAHU   ADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFD-LSHGSAQVKGHGKKVA
                                     .::. . .. .:. :.. :: .:. .. .:
GTM1_M WFAGDKVTYVDFLAYDILDQYRMFEPKCLDAFPNLRDFLARFEGLKKISAYMKS-SRYIA
         150       160       170       180       190        200    

            70        80        90       100       110       120   
HAHU   DALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA
         . . .::                                                   
GTM1_M TPIFSKMAHWSNK                                               
          210                                                      

>>GTM1_RAT GLUTATHIONE S-TRANSFERASE YB1 (EC 2.5.1.18) (  (217 aa)
 initn:  36 init1:  36 opt:  42  Z-score: 55.0  bits: 16.7 E():  1.8
Smith-Waterman score: 42;  34.783% identity (36.364% ungapped) in 23 a
+a overlap (35-56:176-198)

           10        20        30        40         50        60   
HAHU   ADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFD-LSHGSAQVKGHGKKVA
                                     .::. : .. .:. :.. :: .:.      
GTM1_R WFAGDKVTYVDFLAYDILDQYHIFEPKCLDAFPNLKDFLARFEGLKKISAYMKSSRYLST
         150       160       170       180       190       200     

            70        80        90       100       110       120   
HAHU   DALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA
                                                                   
GTM1_R PIFSKLAQWSNK                                                
         210                                                       

>>GTMU_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1.18)   (217 aa)
 initn:  36 init1:  36 opt:  41  Z-score: 53.8  bits: 16.4 E():    2
Smith-Waterman score: 41;  36.364% identity (38.095% ungapped) in 22 a
+a overlap (36-56:177-198)

          10        20        30        40         50        60    
HAHU   DKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFD-LSHGSAQVKGHGKKVAD
                                     ::. : .. .:. :.. :: .:        
GTMU_C FAGDKVTLCGFLAYDVLDQYQMFEPKCLDPFPNLKDFLARFEGLKKISAYMKTSRFLRRP
        150       160       170       180       190       200      

           70        80        90       100       110       120    
HAHU   ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS
                                                                   
GTMU_C IFSKMAQWSNK                                                 
        210                                                        

>>GTMU_RABIT GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18  (217 aa)
 initn:  32 init1:  32 opt:  37  Z-score: 48.8  bits: 15.5 E():  3.4
Smith-Waterman score: 44;  29.508% identity (33.962% ungapped) in 61 a
+a overlap (31-91:17-69)

               10        20        30        40        50        60
HAHU   VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
                                     ::.: .  : : . .   . :.:    ..:
GTMU_R               PMTLGYWDVRGLALPIRMLLEY--TDTSYEEKKYTMGDAPNYDQSK
                             10        20          30        40    

               70        80        90       100       110       120
HAHU   KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
        ... .: ..    :.:: :  : :  .:::                             
GTMU_R WLSEKFTLGL----DFPN-LPYLID-GTHKLTQSNAILRYLARKHGLCGETEEERIRVDI
           50             60         70        80        90        

              130       140                                        
HAHU   VHASLDKFLASVSTVLTSKYR                                       
                                                                   
GTMU_R LENQLMDNRFQLVNVCYSPDFEKLKPEYLKGLPEKLQLYSQFLGSLPWFAGDKITFADFL
      100       110       120       130       140       150        

>>GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18  (217 aa)
 initn:  28 init1:  28 opt:  35  Z-score: 46.4  bits: 15.1 E():  4.1
Smith-Waterman score: 41;  30.769% identity (36.364% ungapped) in 39 a
+a overlap (24-56:160-198)

                      10        20        30             40        
HAHU          VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFL-----SFPTTKTYFPHFD-
                                     : .  :.:.:      .::. : .. .:. 
GTM1_H LPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEG
     130       140       150       160       170       180         

        50        60        70        80        90       100       
HAHU   LSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLV
       : . :: .:.                                                  
GTM1_H LEKISAYMKSSRFLPRPVFSKMAVWGNK                                
     190       200       210                                       

>>GLNA_ANASP GLUTAMINE SYNTHETASE (EC 6.3.1.2) (GLUTAMAT  (473 aa)
 initn:  31 init1:  31 opt:  35  Z-score: 39.7  bits: 15.0 E():  6.1
Smith-Waterman score: 38;  30.435% identity (31.818% ungapped) in 23 a
+a overlap (96-118:447-468)

          70        80        90       100       110       120     
HAHU   LTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASL
                                     ...:: ..   . :  : : ::.       
GLNA_A SLELALEALENDHAFLTDTGVFTEDFIQNWIDYKLANEVKQMQLRPH-PYEFSIYYDV  
        420       430       440       450       460        470     

         130       140 
HAHU   DKFLASVSTVLTSKYR

>>GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MU 4 (EC 2.5.1.18  (218 aa)
 initn:  28 init1:  28 opt:  28  Z-score: 37.7  bits: 13.5 E():  6.5
Smith-Waterman score: 41;  30.769% identity (36.364% ungapped) in 39 a
+a overlap (24-56:161-199)

                      10        20        30             40        
HAHU          VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFL-----SFPTTKTYFPHFD-
                                     : .  :.:.:      .::. : .. .:. 
GTM4_H LPTMMQHFSQFLGKRPWFVGDKITFVDFLAYDVLDLHRIFEPNCLDAFPNLKDFISRFEG
              140       150       160       170       180       190

        50        60        70        80        90       100       
HAHU   LSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLV
       : . :: .:.                                                  
GTM4_H LEKISAYMKSSRFLPKPLYTRVAVWGNK                                
              200       210                                        




141 residues in 1 query   sequences
1776 residues in 7 library sequences
 Scomplib [34t21]
 start: Tue Jan 18 16:24:00 2005 done: Tue Jan 18 16:24:33 2005
 Scan time:  0.000 Display time: 14.000

Function used was FASTA [version 3.4t21 May 14, 2003]
[download]

********************************

output of the code ***********************************

*****enter the output file name you have given***
g.fasta
Enter your cut off percentage30.00


GTM1_RAT GLUTATHIONE S-TRANSFERASE YB1 (EC 2.5.1.18) (  (217 aa)
identity match 34.783 %
 WFAGDKVTYVDFLAYDILDQYHIFEPKCLDAFPNLKDFLARFEGLKKISAYMKSSRYLST
 PIFSKLAQWSNK
 PIFSKLAQWSNK

GTMU_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1.18)   (217 aa)
identity match 36.364 %
 FAGDKVTLCGFLAYDVLDQYQMFEPKCLDPFPNLKDFLARFEGLKKISAYMKTSRFLRRP
 IFSKMAQWSNK
 IFSKMAQWSNK

GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (EC 2.5.1.18  (217 aa)
identity match 30.769 %
 LPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEG
 LEKISAYMKSSRFLPRPVFSKMAVWGNK
 LEKISAYMKSSRFLPRPVFSKMAVWGNK

GLNA_ANASP GLUTAMINE SYNTHETASE (EC 6.3.1.2) (GLUTAMAT  (473 aa)
identity match 30.435 %
 SLELALEALENDHAFLTDTGVFTEDFIQNWIDYKLANEVKQMQLRPH-PYEFSIYYDV
 SLELALEALENDHAFLTDTGVFTEDFIQNWIDYKLANEVKQMQLRPH-PYEFSIYYDV

GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MU 4 (EC 2.5.1.18  (218 aa)
identity match 30.769 %
 LPTMMQHFSQFLGKRPWFVGDKITFVDFLAYDVLDLHRIFEPNCLDAFPNLKDFISRFEG
 LEKISAYMKSSRFLPKPLYTRVAVWGNK
0
[download]

Edit by castaway - added all sorts of formatting tags

Comment on Automating the input to exe file Select or Download Code

Replies are listed 'Best First'.
Re: Automating the input to exe file by renz (Scribe) on Jan 26, 2005 at 19:22 UTC
Hi. I have a problem reading your code. If you could be so kind as to put it between code tags, I would be more inclined to assist. If you did not notice the information in the bulleted list below the preview and create buttons, please read and endeavor to conform to those directions. /renz. "The term clinical depression finds its way into too many conversations these days. One has a sense that a catastrophe has occurred in the psychic landscape." --Leonard Cohen.	[reply]
Re: Automating the input to exe file by ikegami (Patriarch) on Jan 26, 2005 at 20:34 UTC
I don't have time to figure this out right now, but I spotted some "harmless" problems: `print @input[$x];` should be `print $input[$x];` for performance and readability reasons. Why do you use `$per` in `if ($per = m/(\d+\.\d+)% identity/g)`? You promptly overwrite the value. `if ($per = m/(\d+\.\d+)% identity/g)` boils down to `if (m/(\d+\.\d+)% identity/g)` which in turn reduces to `if (m/(\d+\.\d+)% identity/)` Why do you use `$name` in `if ($name = m/>>(.{4,6})(.)/g)`? You never use it anywhere. `if ($name = m/>>(.{4,6})(.)/g)` boils down to `if (m/>>(.{4,6})(.)/g)` which in turn reduces to `if (m/>>(.{4,6})(.)/)` Why do you use `$match` in `if ($match = m/^($seqname)(.)/)`? You never use it anywhere. `if ($match = m/^($seqname)(.)/)` boils down to `if (m/^($seqname)(.*)/)` There's no use for the g modifier in `if(m/>>/g)`. It would be a good idea to check if `system` returned an error. While these changes don't have any visible effect on the code's execution, they will make your code easier to read. Anyone reading the code as is will likely lose confidence in you.	[reply] [d/l] [select]
Command line option + Fasta by FarTech (Novice) on Jan 27, 2005 at 01:24 UTC
Hi guys. Thanks for all the people who helped me I am trying to find the exhaustive homologous match of a given sequence against a given library. I run the fasta34.exe from my perl script,input the sequence & library file name which is stored in the perl folder and other input values .i got a output in the fasta format, which I parse thru and get the best sequence match above a given threshold value and store the match each at a time in a file. Now use this extracted sequence file to run the fasta again against the same library and inputs and try to call the fasta34 again and repeat the same procedure till u get the exhaustive homologous match. If u find any new sequence comparded to the original one, append to the original fasta file. Using my code I am able to extract the sequence from the orginal file which fits the criteria above a threshold using the system command in the perl with the command line i can avoid reentring the inputs The main problem the system command with the command line is not storing in a file. has anybody experienced in using the command line in fasta as well as comparing the original file with new file generated and appending the new contents to the original file Read more... (3 kB)	[reply] [d/l]
Re: Automating the input to exe file by Solo (Deacon) on Jan 26, 2005 at 20:25 UTC
I think this problem can be simplified by thinking about the file as multi-line records. Change your record separator to '>>' and process only the records meeting your threshold. Be sure to include the 'm' at the end of the regex for multi-line matching. `my $threshold = <STDIN>; # or so... my @records; { local $/ = ">>"; while (<FASTA>) { if ( /(\d+\.\d+)% identity/m ) { push @records, $_ if $1 > $threshold; # - or - # process($_); } } } # @records now holds lines meeting threshold # process as needed... # - or - # sub process { ... }` [download] --Solo -- You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.	[reply] [d/l]
Re: Automating the input to exe file by injunjoel (Priest) on Jan 26, 2005 at 20:40 UTC
Greetings all, Some documentation might be in order. Try reading up on perlsub to learn about how perl works with subroutines/functions. Also a quick note. .. If you are attempting to match something with regexp you need to use the `$var =~ m/regexpstuffhere/` format simply using `=` will not do what you are hoping. -InjunJoel "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo	[reply] [d/l] [select]
Re^2: Automating the input to exe file by ikegami (Patriarch) on Jan 26, 2005 at 20:42 UTC
That's what I thought at first, but he is matching against $_ as intended. However, he is storing the number of matches in variables he never uses, and uses the /g modifier needlessly.	[reply]