Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have an array where each element represents a different sequence, I simply want to divide each sequence in @seq into 3, e.g.gcttctgtc -> , and then seperate into three new categories based on the three positions in the new triplet. For example,

gct tct gtc category I = positions 1 +2, therefore = gc tc gt = gctcgt cat II = positions 2+3, = ct ct tc = ctcttc cat III = positions 3+1, = tg tt cg = tgttcg
This part is fine, where I am struggling is for cat III when there isn't a letter in position 3, I want to take it from the first position of the next sequence. e.g. in the sequences below, if there was no 3rd letter in the last triplet, I want to take the 1st letter from the next sequence (in this case 'a' from 'agtcatgcatgact') and use this in its place.

I hope someone can help my confusion!

# @seq contains the following: gcttctgtc agtcatgcatgact gcgtatcatgactgcatgatatgctgct gactgagcactgtgactgcatg for (my $i=0; $i<@seq; $i++) { my @gene = $seq[$i]; my $s = join ('', @gene); @gene = split ('', $s); print "GENE @gene\n"; # FORMAT TYPE I CODING: POSITIONS 1+2 for (my $i=0; $i<@gene; $i+=3) { push @gene_type1, "$gene[$i]"; push @gene_type1, "$gene[$i+1]"; } # FORMAT TYPE 2 CODING: POSITIONS 2+3 for (my $i=0; $i<@gene; $i+=3) { push @gene_type2, "$gene[$i+1]"; push @gene_type2, "$gene[$i+2]"; } # FORMAT TYPE 3 CODING POSITIONS 3+1 for (my $i=0; $i<@gene; $i+=3) { if ($gene[$i+2] !~ /\s+/) { push @gene_type3, "$gene[$i+2]"; push @gene_type3, "$gene[$i]"; } } push @gene_type1, "\n"; push @gene_type2, "\n"; push @gene_type3, "\n"; }

Updated Steve_p - removed wrapping blank lines in the code.

Replies are listed 'Best First'.
Re: array and counting
by bart (Canon) on Oct 04, 2004 at 10:49 UTC
    Append the first 3 characters from the next sequence to the end of the current sequence. Keep a variable holding how many items you started with, and use that to control the loop, instead of using $i<@gene directly:
    my @gene = split //, $seq[$i]; my $count = @gene; if($i < $#seq) { push @gene, split //, substr $seq[$i+1], 0, 3; } # Now, proceed as before: for (my $i=0; $i<$count; $i+=3) { ... }

    p.s. You don't actually need to split the strings into arrays, a plain proper use of substr will work just as fine, as in:

    substr($genestring, $i, 1)
Re: array and counting
by si_lence (Deacon) on Oct 04, 2004 at 12:29 UTC
    Hi,
    I would go for something like this:
    foreach (@seq) { my ($res1, $res2, $res3); print "in : $_\n"; while (/(\w)(\w)(\w)/g) { $res1 .= "$1$2"; $res2 .= "$2$3"; $res3 .= "$3$1"; }; print "out1: $res1\n"; print "out2: $res2\n"; print "out3: $res3\n"; };

    You can skip the print statements of course.
    si_lence