NodeReaper has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: splitting data advice required
by Bloodnok (Vicar) on May 13, 2009 at 14:11 UTC
    IMHO, a hash would be the logical choice for the representation of the data...
    use warnings; use strict; use Data::Dumper; my (%result, $key, $val); while (<DATA>) { if (/^>(.*)$/) { $result{$key} = $val if $val; $key = $1; $val = ''; next; } chomp; $val .= $_; } continue { $result{$key} = $val if $val; } print Dumper \%result; __DATA__ >SEQ1 -----I--RL--AAIDVDG-NLT----------D--R--D-RL-ISTKA-IESIRS--A- -E-K--------K-GLT-VSL----LS------GN-V----I-PVV---YA-L------K IF---------------L-----GINGPVF------------------------------ >SEQ2 -MKI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A-LEAIRR--A- -E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A------S IL---------------I-----G----TS-----------------GP-VV-------- >SEQ3 --KI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A-LEAIRR--A- -E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A------S IL---------------I-----G----TS-----------------GP-VV-------- ---AE--D------GG---A---------------------------------------I
    $VAR1 = { 'SEQ3' => '--KI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A +-LEAIRR--A--E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A----- +-SIL---------------I-----G----TS-----------------GP-VV-----------AE-- +D------GG---A---------------------------------------I', 'SEQ2' => '-MKI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A +-LEAIRR--A--E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A----- +-SIL---------------I-----G----TS-----------------GP-VV--------', 'SEQ1' => '-----I--RL--AAIDVDG-NLT----------D--R--D-RL-ISTKA +-IESIRS--A--E-K--------K-GLT-VSL----LS------GN-V----I-PVV---YA-L----- +-KIF---------------L-----GINGPVF------------------------------' };
    A user level that continues to overstate my experience :-))
Re: splitting data advice required
by almut (Canon) on May 13, 2009 at 14:06 UTC
    I thought initially I could just split on new line, but I cant due to the sequence data occurring over several lines

    Split on entire records, a record being header + sequence.  It looks like the appropriate record separator would be newline followed by '>' (i.e. "\n>"). Then split each record into header and sequence, by treating stuff up to the first newline as header.