Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I Would like to get some help. my program below is dicovering motifs by reading from a file. pi have generated motif and i am trying to match all the possible motifs with my sequence file.the motifs are well generated in lengths of 5,6.but i cannot get it to much the sequence and return the motif in the correct way and then calculate for only motifs that appear 90% in the background. Would anybody help?? Marrein
#Discovery of simple motifs in DNA sequences #Input Files:Foreground sequences and background sequences. #!/usr/bin/perl #--------------------------------------------------------------------- +-- #initialize variables #--------------------------------------------------------------------- +--- use strict; use warnings; my $fgcount; my $count=0; my $motif; my @motifs=''; my $dna =''; my $seq=''; my $sequence=''; my $sequences=''; #my $line=''; my @dna = ( ); my $file = ' '; my @file =( ); my @rna; my $length ; my $combinations ; my $combinations2 ; my $counter; $seq='foreground_faa.txt'; #------------------------------------------- #subroutine to open file for readability #------------------------------------------- sub get_file{ @dna = @_; unless (open(IFILE, $seq)) { print "could not open file sequence!\n"; exit; } @dna = <IFILE>; close IFILE; return @dna; } #-------------------------------------------- #main program. #-------------------------------------------- get_file($seq); $file=extract_sequence(@dna); join_seq($sequence); create_motif($combinations); create_motif2($combinations2); get_all_motifs(@motifs); #get_motif(@file); #get_motif($dna); #-------------------------------------------- #subroutine to extract seq from header file #-------------------------------------------- sub extract_sequence { #initialize variables @dna=@_; $sequence=''; foreach my $line (@dna){ if ($line =~ /^\s*$/) { #discard blank line next; }elsif ($line =~ /^>/){ #discard fasta header next; }else { $sequence .=$line; } } #print $sequence,"\n"; return $sequence; } #-------------------------------- #Subroutine to join sequences #-------------------------------- sub join_seq { @file = split ('',$file); #print @file, "\n"; return @file; } #--------------------------------------------------------------------- +--------- #Subroutines to discover motifs of the lenghth 5 #--------------------------------------------------------------------- +--------- sub create_motif { #creates avery possible 5-mer my $base = join ",", qw/A T C G/; my $L = 5; #my $L2 = 6; my $string = "{$base}" x $L; my $combinations = glob $string; #print join ":", $combinations; return join ":", $combinations; } #--------------------------------------------------------------------- +--------- #Subroutines to discover motifs of the lenghth 6 #--------------------------------------------------------------------- +--------- sub create_motif2 { #creates every possible 6-mer my $base2 = join ",", qw/a t c g/; my $L2 = 6; my $string2 = "{$base2}" x $L2; my $combinations2 = glob $string2; #print join ":", $combinations2; return join ":", $combinations2; } #--------------------------------------------------------------------- +- #subroutine check all the available motifs in the file #--------------------------------------------------------------------- +- sub get_all_motifs { $combinations=$combinations2; @motifs=$combinations; print @motifs; return @motifs; } foreach $combinations (@motifs){ foreach $sequence(@file) { if($sequence =~/$combinations/){ $counter++; } } print $sequence; return $sequence; #exit; } #--------------------------------------------------------------------- +- #subroutine check all the available motifs in the file #--------------------------------------------------------------------- +- sub calculate_percentage { #my $percentage; my $percentage = $counter / scalar(@file); print "$percentage is "; if ($percentage =>90); #print $motif; #return $motif_sequences; }

Replies are listed 'Best First'.
Re: discovering motifs
by chromatic (Archbishop) on Apr 04, 2007 at 02:53 UTC

    The problem is that you've declared lexical variables with my then use them within subroutines without passing them in explicitly or assigning to them from the return values of your subroutines.

    For example:

    my $combinations2; # ... sub create_motif2 { #creates every possible 6-mer my $base2 = join ",", qw/a t c g/; my $L2 = 6; my $string2 = "{$base2}" x $L2; my $combinations2 = glob $string2; #print join ":", $combinations2; return join ":", $combinations2; }

    The $combinations2 declared within the subroutine is fine. It's actually a really good idea. However, it shadows the $combinations2 declared at the top of your program. That's a problem because you use the outer $combinations2 elsewhere in the program without actually assigning to it anywhere.

    You could fix just this one case with:

    my $combinations2 = create_motif2();</p> <p>Then pass <c>$combinations2
    to the subroutines that need it.

    Applying that type of change throughout your program will clear up many similar bugs in waiting. I can't guarantee that will make everything work, but it will help.

Re: discovering motifs
by lima1 (Curate) on Apr 04, 2007 at 11:36 UTC
    If you need a fast and memory efficient solution for your problem, I recommend Bio::Grep. Your matching here
    foreach $combinations (@motifs){ foreach $sequence(@file) { if($sequence =~/$combinations/){ $counter++; } }
    is slow. Really slow.

    Disclaimer: Bio::Grep is written by me ;).