comment on

Here’s one way to approach this task:

#! perl
use strict;
use warnings;

my (%seqs, $id, $dna);

while (my $line = <>)
{
    chomp $line;

    if ($line =~ / ^ > (.+) /x)
    {
        $seqs{$id} = $dna if defined $id;
        $id        = $1;
        $dna       = '';
    }
    else
    {
        $dna      .= $line;
    }
}

$seqs{$id} = $dna if defined $id;

for my $key (sort { length $seqs{$a} <=>
                    length $seqs{$b} } keys %seqs)
{
    printf "%s:%d\n", $key, length $seqs{$key};
}
[download]

Output:

15:55 >perl 1406_SoPW.pl data.fas
SequenceID|9876_Gene2:15
SequenceID|1234_Gene1:16

15:55 >
[download]

Notes:

The above code contains no error checking! In particular, it doesn’t check that the fasta file format is valid. You say “I do not want to use BioPerl”, but a dedicated module is usually better and safer than hand-written code.
The special filehandle <> reads from the file(s) specified on the command line (or from standard input if no files are specified). For other approaches, see perlopentut#Opening-Text-Files-for-Reading.
You say you want to sort the data by length, but you don’t specify the sort order. I have assumed increasing order. If you want decreasing order instead, reverse the occurrences of $a and $b: sort { length $seqs{$b} <=> length $seqs{$a} }

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

In reply to Re: Creating hash from data extracted from text file in fasta format by Athanasius
in thread Creating hash from data extracted from text file in fasta format by reebee3

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.