Hi all, I'm new to Perl and wanted to do some basic bioinformatics work where I take open-reading frames (ORFs) and reverse-transcribe them so that I can look at the beginnings and ends of genes. The file I'm pulling from is organized like this: <sequence name 1> \t <sequence 1> \n <sequence name 2> \t <sequence 2> etc. This allows for me to produce a 2d array on the tabs so that I can pull the sequences and their names at the same time easily at the end. This is the code I've produced:
use strict; my $filename1 = "sequences_with_upstream_stuff.fasta"; open(sequences,'sequences_with_upstream_stuff.fasta'); my @sequences = <sequences>; my @mainseq = (); chomp @sequences; foreach my $seqline(@sequences) { my @temp = split("\t",$seqline); push(@mainseq, \@temp); } #print $mainseq[16][0]; my $rvscomp = (); my $i = (); foreach (@mainseq) {$rvscomp = reverse $mainseq[$i][1]; $rvscomp =~ tr/ACGT/TGCA/; # to get the reverse complement strand print "$mainseq[$i][0]\n\nForward:\n\n$mainseq[$i][1]\n\nReverse:\n\n$ +rvscomp"; ++$i; }
This code works fine; if I have a sequence in the file that is, say, 900 base pairs long, the code will reverse transcribe the sequence and then print the name, the forward sequence, and the reverse transcribed sequence, which is what I want. However, I don't actually *need* all 900 base pairs of either of the forward or reverse for my purposes. I need the first 100 or so base pairs from the forwward sequence, and the first 100 or so from the reverse complement. Is there an easy way to make an if statement where I can say "once the forward sequence hits 100, stop printing it", and likewise for the reverse complement? This would make the file a little smaller and easier to look at.
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |