Sofie has asked for the wisdom of the Perl Monks concerning the following question:
Hi
I have a fasta file containing a number of sequences.
I would like to extract only the sequences that are of a specific length (lets's say >10 nt).
The sequences are wrapped with new lines, so when trying to read into an array, each line becomes an element. How can I removed these? I have tried join, but it also removes the newline between sequences and puts all of it into single string. How do I separate each sequence into an element or string?
The file looks something like this:
Output should be only the second sequence in this case.>NM_001 Homo sapiens ADA2 (CECR1) GATCCAA >NM_002 Homo sapiens IKBKG GGAGGTCTTTAGCTTTAGGGAAACCC
#!/usr/bin/perl -w #open the fastfile Genes.fasta open (GENES, "Genes.fasta") or die "Could not open file"; chomp (@seq = <GENES>); $seq = join ("\n", @seq); $lengthseq = length $seq; #min length of the seq $minlength = 10; #if length is over a certain size, print if ($lengthseq > $minlength){ print $seq; } else {print "No sequence is over $minlength;" }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Size of sequences in fastafile
by zubenel0 (Sexton) on Mar 01, 2020 at 07:18 UTC | |
Re: Size of sequences in fastafile
by hippo (Bishop) on Feb 29, 2020 at 14:08 UTC | |
by Sofie (Acolyte) on Feb 29, 2020 at 16:17 UTC | |
by hippo (Bishop) on Feb 29, 2020 at 16:51 UTC | |
by Sofie (Acolyte) on Mar 01, 2020 at 12:03 UTC | |
by zubenel0 (Sexton) on Mar 01, 2020 at 12:39 UTC | |
by Sofie (Acolyte) on Mar 01, 2020 at 11:03 UTC | |
Re: Size of sequences in fastafile
by BillKSmith (Monsignor) on Feb 29, 2020 at 20:23 UTC | |
Re: Size of sequences in fastafile
by kcott (Archbishop) on Mar 02, 2020 at 08:33 UTC |
Back to
Seekers of Perl Wisdom