discovering motifs

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I Would like to get some help. my program below is dicovering motifs by reading from a file. pi have generated motif and i am trying to match all the possible motifs with my sequence file.the motifs are well generated in lengths of 5,6.but i cannot get it to much the sequence and return the motif in the correct way and then calculate for only motifs that appear 90% in the background. Would anybody help?? Marrein

#Discovery of simple motifs in DNA sequences
#Input Files:Foreground sequences and background sequences.
#!/usr/bin/perl
#---------------------------------------------------------------------
+--
#initialize variables
#---------------------------------------------------------------------
+---

use strict;
use warnings;
my $fgcount;
my $count=0;
my $motif;
my @motifs='';  
my $dna ='';
my $seq='';
my $sequence='';
my $sequences='';
#my $line='';
my @dna = ( );
my $file = ' ';
my @file =( );
my @rna;
my $length ;
my $combinations ;
my $combinations2 ;
my $counter;

$seq='foreground_faa.txt';
#-------------------------------------------
#subroutine to open file for readability
#-------------------------------------------
sub get_file{
@dna = @_;

    unless (open(IFILE, $seq)) {
       print "could not open file sequence!\n";
          exit;
  }
@dna = <IFILE>;
close IFILE;
return @dna;
}

#--------------------------------------------   
#main program.
#--------------------------------------------

 get_file($seq); 
 $file=extract_sequence(@dna);
 join_seq($sequence);
 create_motif($combinations);
 create_motif2($combinations2);
 get_all_motifs(@motifs);
 #get_motif(@file);
 #get_motif($dna); 

#--------------------------------------------   
#subroutine to extract seq from header file
#--------------------------------------------
  sub extract_sequence  {
#initialize variables
   @dna=@_;
   $sequence='';
   
       
  
  foreach my $line (@dna){
  if ($line =~ /^\s*$/) { #discard blank line
      next;
       }elsif ($line =~ /^>/){ #discard fasta header
         next;
          }else {                   
           $sequence .=$line;
            }
           }

#print $sequence,"\n";
 return $sequence;

}
#--------------------------------
#Subroutine to join sequences
#--------------------------------
 sub join_seq {

@file = split ('',$file);
#print @file, "\n";
return @file;
 }

#---------------------------------------------------------------------
+---------
#Subroutines to discover motifs of the lenghth 5
#---------------------------------------------------------------------
+---------
sub create_motif {  #creates avery possible 5-mer

my $base = join ",", qw/A T C G/;
my $L = 5;
#my $L2 = 6;
my $string = "{$base}" x $L;
my $combinations = glob $string;

#print join ":", $combinations;
return join ":", $combinations;
}  

#---------------------------------------------------------------------
+---------
#Subroutines to discover motifs of the lenghth 6
#---------------------------------------------------------------------
+---------

sub create_motif2 {  #creates every possible 6-mer

 my $base2 = join ",", qw/a t c g/;
 my $L2 = 6;
 my $string2 = "{$base2}" x $L2;
 my $combinations2 = glob $string2;

#print join ":", $combinations2;
return join ":", $combinations2;
     }

#---------------------------------------------------------------------
+-
#subroutine check all the available motifs in the file
#---------------------------------------------------------------------
+- 
sub get_all_motifs { 

$combinations=$combinations2;
@motifs=$combinations;
print @motifs; 
return @motifs;
}

foreach  $combinations (@motifs){
   foreach $sequence(@file)  {
     if($sequence =~/$combinations/){
      $counter++;
    }
   }
                                                          
     print $sequence;
     return $sequence;
      #exit;
            }

#---------------------------------------------------------------------
+-
#subroutine check all the available motifs in the file
#---------------------------------------------------------------------
+-    
sub calculate_percentage  {
#my $percentage;

   my $percentage = $counter / scalar(@file);
     print "$percentage is ";
   
      if ($percentage =>90);
          
       #print $motif;
       #return $motif_sequences;    
        
}
[download]

Comment on discovering motifs Download Code

Replies are listed 'Best First'.
Re: discovering motifs by chromatic (Archbishop) on Apr 04, 2007 at 02:53 UTC
The problem is that you've declared lexical variables with my then use them within subroutines without passing them in explicitly or assigning to them from the return values of your subroutines. For example: `my $combinations2; # ... sub create_motif2 { #creates every possible 6-mer my $base2 = join ",", qw/a t c g/; my $L2 = 6; my $string2 = "{$base2}" x $L2; my $combinations2 = glob $string2; #print join ":", $combinations2; return join ":", $combinations2; }` [download] The `$combinations2` declared within the subroutine is fine. It's actually a really good idea. However, it shadows the `$combinations2` declared at the top of your program. That's a problem because you use the outer `$combinations2` elsewhere in the program without actually assigning to it anywhere. You could fix just this one case with: `my $combinations2 = create_motif2();</p> <p>Then pass <c>$combinations2` [download] to the subroutines that need it. Applying that type of change throughout your program will clear up many similar bugs in waiting. I can't guarantee that will make everything work, but it will help.	[reply] [d/l] [select]
Re: discovering motifs by lima1 (Curate) on Apr 04, 2007 at 11:36 UTC
If you need a fast and memory efficient solution for your problem, I recommend Bio::Grep. Your matching here `foreach $combinations (@motifs){ foreach $sequence(@file) { if($sequence =~/$combinations/){ $counter++; } }` [download] is slow. Really slow. Disclaimer: Bio::Grep is written by me ;).	[reply] [d/l]