Bio_student has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I've written perl codes below. My wish that this code should give output : how many times each motif match with the sequences ($dna), but $regexp (variable contain multiple motifs) only read the last motif instead of 11 motifs (in subroutine its give 11 motifs, but when I call the function its give only 1 motif). Please help me what's wrong with my scripts. Thanks in advance.

use strict; use warnings; my @file_data = (); my %motif_hash = (); my $dna; my $regexp; my @positions = (); my $site; my @motifs; my $match; my $count; @file_data = get_file_data("promoters.txt"); $dna = extract_sequence(@file_data); $regexp = parseMOTIF('motifs.txt'); print "Regexp: '$regexp'\n"; $match = match_positions($count); push(@positions,$match); print "match : $match"; sub match_positions{ my($regexp,$dna) = @_; my $count= 0; my @positions; my $regexp; my $dna; while ($dna =~ /$regexp/g){ push(@positions,pos($dna)-length($&)+1); ++$count; } return $count; } sub parseMOTIF { use strict; use warnings; my @motiffile = (); my $name; my $site; my $regexp; my %motif_hash = (); my $motiffile = $_[0]; @motiffile = get_file_data($motiffile); foreach $motiffile(@motiffile){ if($motiffile =~ /(^[A|S]\d+\s+|^[A|S]\d+b\s+)([A-Z]+)(\s+.*)$ +/){ my $name = $1; my $site = $2; $regexp = IUB_to_regexp($site); $motif_hash{$name} = "$site $regexp\n"; print "motif : $site\n"; print "The regexp : $regexp\n"; } } return $regexp; } sub get_file_data{ my ($filename)=@_; use strict; use warnings; my @filedata=(); unless(open(GET_FILE_DATA, $filename)){ print STDERR "can't open file $filename\"\n\n"; exit; } @filedata = <GET_FILE_DATA>; close GET_FILE_DATA; return @filedata; } sub extract_sequence { use strict; my (@file_data) = @_; use warnings; my $sequence = ""; foreach my $line(@file_data) { if ($line =~ /^>/){ next; } else { $sequence .= $line; } } #$sequence =~ s/\s//g; return $sequence; } sub IUB_to_regexp{ my($iub) = @_; my $regular_expression =""; my %iub2char_class = ( A =>'A', C =>'C', G =>'G', T =>'T', R =>'GA', Y =>'CT', M =>'AC', K =>'GT', S =>'GC', W =>'AT', B =>'CGT', D =>'AGT', H =>'ACT', V =>'ACG', N =>'ACGT', ); for (my $i=0;$i<length($iub);$i++){ $regular_expression .= $iub2char_class{substr($iub,$i,1)}; } return $regular_expression; }

Replies are listed 'Best First'.
Re: codes error
by bart (Canon) on Dec 10, 2010 at 12:14 UTC
    Here's your problem:
    foreach $motiffile(@motiffile){ if($motiffile =~ /(^[A|S]\d+\s+|^[A|S]\d+b\s+)([A-Z]+)(\s+.*)$ +/){ my $name = $1; my $site = $2; $regexp = IUB_to_regexp($site); $motif_hash{$name} = "$site $regexp\n"; print "motif : $site\n"; print "The regexp : $regexp\n"; } } return $regexp; ...
    You start to loop through the array @motiffile, but as you get through to the end of this snippet, you return from the sub with one result, thus leaving the loop.

    What you can do is collect all results, for example by using this instead of the return statement:

    push @regexp, $regexp
    (and of course, @regexp must be declared higher up in the sub), and after the loop block, return a combined regexp:
    return join '|', @regexp;

    update Oops, I misread the code (bad indentation), you return after the loop block, instead of in it... But the end result is nearly the same, except now you return the last $regexp instead of the first.

    Put my push statement in the loop block.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: codes error :(
by Sinistral (Monsignor) on Dec 10, 2010 at 14:09 UTC

    Humm, ACTG, I sense genetic analysis is the reason for writing this code. Whenever I see someone trying to do such things, I want to point out the existence of BioPerl. Head on over to the main BioPerl site to get modules, instruction, and information that should help.

Re: codes error :(
by Anonymous Monk on Dec 10, 2010 at 10:57 UTC
    Please help me what's wrong with my scripts.

    Well, for starters you did not write them :)

      Which one?