listendohg has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm new to Perl and wanted to do some basic bioinformatics work where I take open-reading frames (ORFs) and reverse-transcribe them so that I can look at the beginnings and ends of genes. The file I'm pulling from is organized like this: <sequence name 1> \t <sequence 1> \n <sequence name 2> \t <sequence 2> etc. This allows for me to produce a 2d array on the tabs so that I can pull the sequences and their names at the same time easily at the end. This is the code I've produced:

use strict; my $filename1 = "sequences_with_upstream_stuff.fasta"; open(sequences,'sequences_with_upstream_stuff.fasta'); my @sequences = <sequences>; my @mainseq = (); chomp @sequences; foreach my $seqline(@sequences) { my @temp = split("\t",$seqline); push(@mainseq, \@temp); } #print $mainseq[16][0]; my $rvscomp = (); my $i = (); foreach (@mainseq) {$rvscomp = reverse $mainseq[$i][1]; $rvscomp =~ tr/ACGT/TGCA/; # to get the reverse complement strand print "$mainseq[$i][0]\n\nForward:\n\n$mainseq[$i][1]\n\nReverse:\n\n$ +rvscomp"; ++$i; }

This code works fine; if I have a sequence in the file that is, say, 900 base pairs long, the code will reverse transcribe the sequence and then print the name, the forward sequence, and the reverse transcribed sequence, which is what I want. However, I don't actually *need* all 900 base pairs of either of the forward or reverse for my purposes. I need the first 100 or so base pairs from the forwward sequence, and the first 100 or so from the reverse complement. Is there an easy way to make an if statement where I can say "once the forward sequence hits 100, stop printing it", and likewise for the reverse complement? This would make the file a little smaller and easier to look at.

Replies are listed 'Best First'.
Re: I want to print a limited subsection of certain outputs?
by 1nickt (Canon) on Feb 28, 2016 at 12:56 UTC

    Hi listendohg,

    I see that you figured out a solution. Here's a way that you might find simpler using array slices:

    #/usr/bin/perl use strict; use warnings; use feature 'say'; my @x = ( 1 .. 100 ); my @y = @x[ 0 .. 9 ]; my @z = @x[ -10 .. -1 ]; say join ',', @y; say join ',', reverse @z; __END__
    Output:
    1,2,3,4,5,6,7,8,9,10 100,99,98,97,96,95,94,93,92,91
    If you don't know how many elements there are in your array:
    my @y = @x[ 0 .. 9 ]; my @z = @x[ $#x-9 .. $#x ];

    Note that unlike in your solution which makes new lists, changing the values in @y or @z will change the original array @x.

    Hope this helps!

    Edit: add example without specified indices.

    The way forward always starts with a minimal test.
      Interesting, I'll look into this approach when I have to do this again. The code doesn't take long to write so I might as well start from scratch next time.
Re: I want to print a limited subsection of certain outputs?
by listendohg (Novice) on Feb 27, 2016 at 19:40 UTC

    Nevermind all, I figured out a way with a bit more tinkering. I just made variables using substrings that took the first 100 base pairs from both the forward and reverse. The foreach loop is now a for loop that looks like this:

    for (my $i = 0; $i < scalar(@mainseq); ++$i) { $rvscomp = reverse $mainseq[$i][1]; $rvscomp =~ tr/ACGT/TGCA/; $fwd100 = substr($mainseq[$i][1],0,100); $rvs100 = substr($rvscomp,0,100); print "$mainseq[$i][0]_forward_first_100\n$fwd100\n$mainseq[$i][0]_rev +ersecomp_first_100\n$rvs100\n"; }