minnie has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have written a simple program to print a string with one character of the string changed to either A,C,G or T. This is a simple bioinformatics program to generate a SNP. I see the output with as well as without the print statement of the subroutine for some unknown reason.

Actual possible output (out of many expected) should be: The DNA sequence is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA mutation at base position 5 AAAATAAAAAAAAAAAAAAAAAAAAAAAAA But sometimes the following is obtained: The DNA sequence is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
(Here the sub routince call is not considered...why?)

Can anybody help me with the reason for this. Thanks in advance, minnie

Program code is as follows:
#!/usr/bin/perl -w print "The DNA sequence is : \n" ; my $seq = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'; print "$seq \n"; mutate ($seq); print "\n"; exit; sub mutate{ my($dna) = @_; my $position = randposition($dna); for ($i =0;$i<length($dna);$i++){ if ($i == $position){ print " mutation at base position ". ($position+1) . "\n"; do { $nucleotide = randombase(); } until ($nucleotide ne substr($dna,$position,1)); substr($dna, $position, 1, $nucleotide); print "$dna\n"; } $i++; } } sub randposition{ my($sequence) = @_; return int rand length($sequence); } sub randombase{ my(@nucleo) =('A','T','C','G'); return $nucleo[rand(@nucleo)]; }

Replies are listed 'Best First'.
Re: Help regarding the outout of the program
by Your Mother (Archbishop) on Feb 24, 2010 at 04:20 UTC

    Here's a reformed version. Your main problem seemed to be the extra increment of $i down at the bottom. There is no reason to do this with a loop if you already know the position so I removed it and inlined one of the subs. Give it a shot.

    use warnings; use strict; my $seq = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'; print "The DNA sequence is : $seq\n"; mutate($seq); exit; sub mutate { my $dna = shift; my $position = int(rand(length($dna))); print " mutation at base position ". ($position+1) . "\n"; my $nucleotide; do { $nucleotide = randombase(); } until ( $nucleotide ne substr($dna,$position,1) ); substr($dna, $position, 1, $nucleotide); print "$dna\n"; } sub randombase { my ( @nucleo ) = qw ( A T C G ); return $nucleo[rand(@nucleo)]; }
Re: Help regarding the outout of the program
by jwkrahn (Abbot) on Feb 24, 2010 at 04:36 UTC

    Your problem is with the for loop:

    for ($i =0;$i<length($dna);$i++){ if ($i == $position){ print " mutation at base position ". ($position+1) . "\n"; do { $nucleotide = randombase(); } until ($nucleotide ne substr($dna,$position,1)); substr($dna, $position, 1, $nucleotide); print "$dna\n"; } $i++; }

    You increment $i at the top of the loop and at the bottom of the loop so sometimes $i will never be equal to $position.

    Better to write the loop as:

    for my $i ( 0 .. length( $dna ) - 1 ) {

    Also, you don't need a do block for your until statement modifier:

    $nucleotide = randombase() until $nucleotide ne substr $dna, $posi +tion, 1;
Re: Help regarding the outout of the program
by Utilitarian (Vicar) on Feb 24, 2010 at 08:54 UTC
    Two suggestions
    1. You do not need the for loop at all, you are merely counting to 30 and doing something once when you reach a magic number. This is particularly true if the starting dna sequence is greater than 30 bases.
    2. A look up table would be faster then an "until we get a different answer", this is particularly true if you are going to mutate your starting sequence repeatedly.
    Try this for example
    #!/usr/bin/perl use strict; use warnings; print "The DNA sequence is : \n" ; my $seq = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'; print "$seq \n"; mutate ($seq); exit; sub mutate{ my($dna) = shift; my $position = randposition($dna); print " mutation at base position ". ($position+1) . "\n"; my $nucleotide = randombase(substr($dna,$position,1)); substr($dna, $position, 1, $nucleotide); print "$dna\n"; } sub randposition{ my($sequence) = @_; return int rand length($sequence); } sub randombase{ my $base=shift; my(%nucleo)=( 'A'=>['T','C','G'], 'T'=>['A','C','G'], 'C'=>['A','T','G'], 'G'=>['A','T','G'], ); return $nucleo{$base}[rand(@{$nucleo{$base}})]; }

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

      Utilitarian:

      ++ for removing the loop. I had a thought about your randombase routine, though. I came up with:

      sub randombase { # Prevent us from returning the same symbol my @list = grep { $_ ne $_[0] } @symbols; return $list[rand(@list)]; }

      ...roboticus