Re: splitting data advice requested

How about this?

#!/usr/bin/perl

use strict;
use Data::Dumper;

local $/ = ">";
my %hash;
while (<DATA>) {
    s/>//;
    s/(^SEQ\d)//;
    $hash{">". $1} = $_ if ( defined( $1 ));
}

print Dumper(\%hash);

__DATA__
>SEQ1
-----I--RL--AAIDVDG-NLT----------D--R--D-RL-ISTKA-IESIRS--A-
-E-K--------K-GLT-VSL----LS------GN-V----I-PVV---YA-L------K
IF---------------L-----GINGPVF------------------------------
>SEQ2
-MKI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A-LEAIRR--A-
-E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A------S
IL---------------I-----G----TS-----------------GP-VV--------
>SEQ3
--KI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A-LEAIRR--A-
-E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A------S
IL---------------I-----G----TS-----------------GP-VV--------
---AE--D------GG---A---------------------------------------I
[download]

Comment on Re: splitting data advice requested Download Code

Replies are listed 'Best First'.
Re^2: splitting data advice requested by oxone (Friar) on May 13, 2009 at 22:19 UTC
I like your approach, because it avoids the problems of the "full file slurping" options suggested above (ie. won't scale for very large input files), and it recognises that '>' is a handy delimiter here. A small suggested improvement so that it removes the line breaks as per the OP, and is a bit shorter: `... while (<DATA>) { $hash{">$1"} = $2 if s/[>\n]//g && /^(SEQ\d+)(.*)/; } ...` [download]	[reply] [d/l]