Help regarding the outout of the program

minnie has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have written a simple program to print a string with one character of the string changed to either A,C,G or T. This is a simple bioinformatics program to generate a SNP. I see the output with as well as without the print statement of the subroutine for some unknown reason.

Actual possible output (out of many expected) should be:
The DNA sequence is:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 mutation at base position 5
AAAATAAAAAAAAAAAAAAAAAAAAAAAAA

But sometimes the following is obtained:
The DNA sequence is:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[download]

(Here the sub routince call is not considered...why?)

Can anybody help me with the reason for this. Thanks in advance, minnie

Program code is as follows:

#!/usr/bin/perl -w

print "The DNA sequence is : \n" ;
my $seq = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA';

print "$seq \n";
mutate ($seq);
print "\n";
exit;

sub mutate{
my($dna) = @_;
my $position = randposition($dna);


for ($i =0;$i<length($dna);$i++){

    if ($i == $position){

print " mutation at  base position ". ($position+1) . "\n";
    do {
    $nucleotide = randombase();

     } until ($nucleotide ne substr($dna,$position,1));
    substr($dna, $position, 1, $nucleotide);

    print "$dna\n";
    }
$i++;
}


}

sub randposition{
my($sequence) = @_;
return int rand length($sequence);
}

sub randombase{

my(@nucleo) =('A','T','C','G');
return $nucleo[rand(@nucleo)];
}
[download]

Comment on Help regarding the outout of the program Select or Download Code

Replies are listed 'Best First'.
Re: Help regarding the outout of the program by Your Mother (Archbishop) on Feb 24, 2010 at 04:20 UTC
Here's a reformed version. Your main problem seemed to be the extra increment of `$i` down at the bottom. There is no reason to do this with a loop if you already know the position so I removed it and inlined one of the subs. Give it a shot. `use warnings; use strict; my $seq = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'; print "The DNA sequence is : $seq\n"; mutate($seq); exit; sub mutate { my $dna = shift; my $position = int(rand(length($dna))); print " mutation at base position ". ($position+1) . "\n"; my $nucleotide; do { $nucleotide = randombase(); } until ( $nucleotide ne substr($dna,$position,1) ); substr($dna, $position, 1, $nucleotide); print "$dna\n"; } sub randombase { my ( @nucleo ) = qw ( A T C G ); return $nucleo[rand(@nucleo)]; }` [download]	[reply] [d/l] [select]
Re: Help regarding the outout of the program by jwkrahn (Abbot) on Feb 24, 2010 at 04:36 UTC
Your problem is with the for loop: `for ($i =0;$i<length($dna);$i++){ if ($i == $position){ print " mutation at base position ". ($position+1) . "\n"; do { $nucleotide = randombase(); } until ($nucleotide ne substr($dna,$position,1)); substr($dna, $position, 1, $nucleotide); print "$dna\n"; } $i++; }` [download] You increment `$i` at the top of the loop and at the bottom of the loop so sometimes `$i` will never be equal to `$position`. Better to write the loop as: `for my $i ( 0 .. length( $dna ) - 1 ) {` [download] Also, you don't need a do block for your until statement modifier: `$nucleotide = randombase() until $nucleotide ne substr $dna, $posi +tion, 1;` [download]	[reply] [d/l] [select]
Re: Help regarding the outout of the program by Utilitarian (Vicar) on Feb 24, 2010 at 08:54 UTC
Two suggestions You do not need the for loop at all, you are merely counting to 30 and doing something once when you reach a magic number. This is particularly true if the starting dna sequence is greater than 30 bases. A look up table would be faster then an "until we get a different answer", this is particularly true if you are going to mutate your starting sequence repeatedly. Try this for example #!/usr/bin/perl use strict; use warnings; print "The DNA sequence is : \n" ; my $seq = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'; print "$seq \n"; mutate ($seq); exit; sub mutate{ my($dna) = shift; my $position = randposition($dna); print " mutation at base position ". ($position+1) . "\n"; my $nucleotide = randombase(substr($dna,$position,1)); substr($dna, $position, 1, $nucleotide); print "$dna\n"; } sub randposition{ my($sequence) = @_; return int rand length($sequence); } sub randombase{ my $base=shift; my(%nucleo)=( 'A'=>['T','C','G'], 'T'=>['A','C','G'], 'C'=>['A','T','G'], 'G'=>['A','T','G'], ); return $nucleo{$base}[rand(@{$nucleo{$base}})]; } [download] `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l] [select]
Re^2: Help regarding the outout of the program by roboticus (Chancellor) on Feb 24, 2010 at 12:04 UTC
Utilitarian: ++ for removing the loop. I had a thought about your randombase routine, though. I came up with: `sub randombase { # Prevent us from returning the same symbol my @list = grep { $_ ne $_[0] } @symbols; return $list[rand(@list)]; }` [download] ...roboticus	[reply] [d/l]