Hello reebee3,
Here’s one way to approach this task:
#! perl
use strict;
use warnings;
my (%seqs, $id, $dna);
while (my $line = <>)
{
chomp $line;
if ($line =~ / ^ > (.+) /x)
{
$seqs{$id} = $dna if defined $id;
$id = $1;
$dna = '';
}
else
{
$dna .= $line;
}
}
$seqs{$id} = $dna if defined $id;
for my $key (sort { length $seqs{$a} <=>
length $seqs{$b} } keys %seqs)
{
printf "%s:%d\n", $key, length $seqs{$key};
}
Output:
15:55 >perl 1406_SoPW.pl data.fas
SequenceID|9876_Gene2:15
SequenceID|1234_Gene1:16
15:55 >
Notes:
- The above code contains no error checking! In particular, it doesn’t check that the fasta file format is valid. You say “I do not want to use BioPerl”, but a dedicated module is usually better and safer than hand-written code.
- The special filehandle <> reads from the file(s) specified on the command line (or from standard input if no files are specified). For other approaches, see perlopentut#Opening-Text-Files-for-Reading.
- You say you want to sort the data by length, but you don’t specify the sort order. I have assumed increasing order. If you want decreasing order instead, reverse the occurrences of $a and $b: sort { length $seqs{$b} <=> length $seqs{$a} }
Hope that helps,
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.