in reply to how to split

An alternative method would be to maintain a state variable, reading the file line by line and detecting when you move between header and result sections. I also maintain an array reference which points to either the headers or lines arrays so we push onto the correct array.

use strict; use warnings; use Data::Dumper; open my $inFH, q{<}, \ <<'END_OF_FILE' or die qq{open: $!\n}; >1gmz_B mol:protein length:122 PHOSPHOLIPASE A2 Length = 122 Score = 103 bits (233), Expect = 9e-23 Identities = 56/124 (45%), Positives = 67/124 (54%), Gaps = 12/124 (9 +%) Query: 2 LWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPVDDLDRCCQTHDNCYKQAKKLDS +C 61 LWQF MI K P + YGCYCG+GG G P D DRCC HD CY KL S +C Sbjct: 2 LWQFGKMI-LKETGKLPFPYYVTYGCYCGVGGRGGPKDATDRCCFVHDCCY---GKLTS +C 57 Query: 62 KVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNCDRNAAICFSK--VPYNKEHKNL +D 119 K P T+ YSYS + I C EN+ C IC CD+ AA+CF + YNK++ + + Sbjct: 58 K-----PKTDRYSYSRKDGTIVC-GENDPCRKEICECDKAAAVCFRENLDTYNKKYMSY +L 111 Query: 120 KKNC 123 K C Sbjct: 112 KSLC 115 >1b4w_A mol:protein length:122 PROTEIN (PHOSPHOLIPASE A2) Length = 122 Score = 95.7 bits (215), Expect = 2e-20 Identities = 46/105 (43%), Positives = 61/105 (58%), Gaps = 10/105 (9 +%) Query: 2 LWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPVDDLDRCCQTHDNCYKQAKKLDS +C 61 L QF MIK K+ EP++ + YGCYCG GG G P D DRCC HD CY +K+ +C Sbjct: 2 LLQFRKMIK-KMTGKEPVVSYAFYGCYCGSGGRGKPKDATDRCCFVHDCCY---EKVTG +C 57 Query: 62 KVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNCDRNAAICF 106 +P ++Y+YS N I C + + C+ +C CD+ AAICF Sbjct: 58 -----DPKWDDYTYSWKNGTIVCGGD-DPCKKEVCECDKAAAICF 96 END_OF_FILE my $inHeader; my $pushTo; my @headers = (); my @lines = (); while ( <$inFH> ) { chomp; if ( not $inHeader and m{^>} ) { $inHeader = 1; $pushTo = \ @headers; } if ( $inHeader and m{^Query:} ) { $inHeader = 0; $pushTo = \ @lines; } push @$pushTo, $_; } close $inFH or die qq{close: $!\n}; print Data::Dumper->Dumpxs( [ \ @headers, \ @lines ], [ qw{ *headers *lines } ], );

Here's the output.

@headers = ( '>1gmz_B mol:protein length:122 PHOSPHOLIPASE A2', ' Length = 122', '', ' Score = 103 bits (233), Expect = 9e-23', ' Identities = 56/124 (45%), Positives = 67/124 (54%), Ga +ps = 12/124 (9%)', '', '>1b4w_A mol:protein length:122 PROTEIN (PHOSPHOLIPASE A +2)', ' Length = 122', '', ' Score = 95.7 bits (215), Expect = 2e-20', ' Identities = 46/105 (43%), Positives = 61/105 (58%), Ga +ps = 10/105 (9%)', '' ); @lines = ( 'Query: 2 LWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPVDDLDRCCQTH +DNCYKQAKKLDSC 61', ' LWQF MI K P + YGCYCG+GG G P D DRCC H +D CY KL SC', 'Sbjct: 2 LWQFGKMI-LKETGKLPFPYYVTYGCYCGVGGRGGPKDATDRCCFVH +DCCY---GKLTSC 57', '', 'Query: 62 KVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNCDRNAAICFSK +--VPYNKEHKNLD 119', ' K P T+ YSYS + I C EN+ C IC CD+ AA+CF + + YNK++ + ', 'Sbjct: 58 K-----PKTDRYSYSRKDGTIVC-GENDPCRKEICECDKAAAVCFRE +NLDTYNKKYMSYL 111', '', 'Query: 120 KKNC 123', ' K C', 'Sbjct: 112 KSLC 115', '', 'Query: 2 LWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPVDDLDRCCQTH +DNCYKQAKKLDSC 61', ' L QF MIK K+ EP++ + YGCYCG GG G P D DRCC H +D CY +K+ C', 'Sbjct: 2 LLQFRKMIK-KMTGKEPVVSYAFYGCYCGSGGRGKPKDATDRCCFVH +DCCY---EKVTGC 57', '', 'Query: 62 KVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNCDRNAAICF 1 +06', ' +P ++Y+YS N I C + + C+ +C CD+ AAICF', 'Sbjct: 58 -----DPKWDDYTYSWKNGTIVCGGD-DPCKKEVCECDKAAAICF 9 +6', '' );

Some good habits to get into

Regarding the best website to learn Perl, I think you've already found it!

Cheers,

JohnGG

Update: The array ref. is not necessary as you can use a ternary to determine which array to push onto.

push @{ $inHeader ? @headers : @lines }, $_;