Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

Please go easy on me... i'm just a beginner.

o.k - my problem is that I want to calculate how many times a 'G' or 'C' occurs on the 3rd position in a piece of DNA starting from the first base..
e.g.   CTCCGGATCTAT A C/G is present on 3/4 3rd positions = 75%.

i am struggling a bit.

for ($i = 1; $i < @dna; $i++) { if ($dna[$i+3] eq 'C') { $counter++; } elsif ($dna[$i+3] eq 'G') { $counter++; } }
I can't think how to get the code to loop to every third base! Any help greatly appreciated.

update (broquaint): title change (was for loops) and added formatting

Replies are listed 'Best First'.
Re: for loops
by Corion (Patriarch) on Dec 16, 2002 at 19:40 UTC

    I think you are attacking the problem from a wrong angle. You are walking through your DNA string step by step, instead of jumping through it in increments of three. Also you are using a C-style loop, which is very error prone, in your case for example the first element of @dna is at $dna[0]. I'd rewrite your code in the following way, jumping forward three slots with every round :

    use strict; my $i = 2; my @dna = split //, 'CTCCGGATCTAT'; my $counter = 0; my $total = 0; while ($i < @dna) { if ($dna[$i] eq 'C') { $counter++; } elsif ($dna[$i] eq 'G') { $counter++; } $total++; $i += 3; } print "Found $counter G/C of $total elements at a 3-position\n";
    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: for loops
by Enlil (Parson) on Dec 16, 2002 at 19:44 UTC
    Your for loop is giving testing everything after @dna[4], not every third one, as the $i is only being incremented by one each time through the loop. i.e.
    Wasn't sure how to explain but here are the values of $i and $i + 3 each time through the loop:
      $i $i+3
      1 4
      2 5
      3 6
    ,etc...

    I think what you are looking for is something more along these lines (as a for loop):

    for ( my $i = 2; $i < @dna; $i = $i+3 ) { print "$dna[$i] at position $i\n"; if ( $dna[$i] eq 'C' or $dna[$i] eq 'G') { $counter++; } }
    Here the $i is incremented by three every time through the loop, which I believe is what you wanted.

    -enlil

Re: for loops
by Mr. Muskrat (Canon) on Dec 16, 2002 at 19:59 UTC

    Here is my solution:

    use strict; use warnings; my $string = "CTCCGGATCTAT"; my $sets = length($string)/3; # we want to know how man +y sets of 3 there are my $unpacker = 'A3' x $sets; # create our unpack templ +ate my @dna = unpack($unpacker, $string); # create the array of set +s my $counter = 0; # initialize the counter for my $set (@dna) { # for each $set in @dna d +o... my $check = substr($set, 2, 1); # get the 3rd char of the + $set ++$counter if ($check =~ /[GC]/i); # increment the counter, +ignore case } print "$counter/$sets was either C or G.\n"; # print results
      Why so complicated?
      use strict; use warnings; my $string = "CTCCGGATCTAT"; my $counter = 0; $counter++ while $string =~ /..[GC]/gi; print "Found $counter occurences$/";
      Oops, duh. I even tested. %-/
      use strict; use warnings; my $string = "CTCCGGATCTAT"; my %flag = (C => 1, G => 1, A => 0, T => 0); my $counter = 0; $counter += $flag{$1} while $string =~ /..(.)/g; print "Found $counter occurences$/";

      Makeshifts last the longest.

        Ugh, had to read BrowserUk's approach twice to see what was going on. Mine isn't as simple as it started out either - raaa, so obvious in hindsight: my $count = grep /[GC]/, $dna =~ /..(.)/g; There. It slices, it dices, it counts DNA bases. (Or is that RNA? Anyway.)

        Makeshifts last the longest.

        Why so complicated? ;)

        my $count=0; $count += substr($dna, $_, 1) =~ /[CG]/ for map 3 * $_, 1 .. length($d +na)/3; print $count;

        Examine what is said, not who speaks.

        Why so complicated?

        1. TIMTOWTDI
        2. The comments shed light on how it works
        3. It worked the first time. q-:
        4. Why not?

Re: for loops
by djantzen (Priest) on Dec 16, 2002 at 19:47 UTC

    Try incrementing by three rather than by one and then looking ahead three. At present, you're iterating over each element in the array and then checking to see the value of your index plus 3. So you're seeing 4, then 5, 6, 7, since you start counting at 1. Try a loop construct like:

    for ($i = 2; $i < @dna; $i += 3) # assuming your start index is 0

Re: for loops
by dingus (Friar) on Dec 17, 2002 at 10:39 UTC
    Almost certainly you will want to do some other things after this so I would recommend that you investigate some of the clever things done by the bioperl.org project. This includes a whole raft of DNA analysis stuff.

    Dingus


    Enter any 47-digit prime number to continue.
Re: for loops
by ibanix (Hermit) on Dec 18, 2002 at 00:22 UTC
    You can't get to third base with Perl. She's not that kind of girl.

    :-P

    ibanix

    $ echo '$0 & $0 &' > foo; chmod a+x foo; foo;