i have an output like this having 2 sets of alignments..
in this, i want the first 4 lines of the 1st set of alignment and the first 4 lines of the 2nd set of alignment in an array. The same way, The alignment part of the 1st and the 2nd set in an array.
The two sets of alignments are seperated by a '>' symbol.
>1gmz_B mol:protein length:122 PHOSPHOLIPASE A2
Length = 122
Score = 103 bits (233), Expect = 9e-23
Identities = 56/124 (45%), Positives = 67/124 (54%), Gaps = 12/124 (9
+%)
Query: 2 LWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPVDDLDRCCQTHDNCYKQAKKLDS
+C 61
LWQF MI K P + YGCYCG+GG G P D DRCC HD CY KL S
+C
Sbjct: 2 LWQFGKMI-LKETGKLPFPYYVTYGCYCGVGGRGGPKDATDRCCFVHDCCY---GKLTS
+C 57
Query: 62 KVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNCDRNAAICFSK--VPYNKEHKNL
+D 119
K P T+ YSYS + I C EN+ C IC CD+ AA+CF + YNK++ +
+
Sbjct: 58 K-----PKTDRYSYSRKDGTIVC-GENDPCRKEICECDKAAAVCFRENLDTYNKKYMSY
+L 111
Query: 120 KKNC 123
K C
Sbjct: 112 KSLC 115
>1b4w_A mol:protein length:122 PROTEIN (PHOSPHOLIPASE A2)
Length = 122
Score = 95.7 bits (215), Expect = 2e-20
Identities = 46/105 (43%), Positives = 61/105 (58%), Gaps = 10/105 (9
+%)
Query: 2 LWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPVDDLDRCCQTHDNCYKQAKKLDS
+C 61
L QF MIK K+ EP++ + YGCYCG GG G P D DRCC HD CY +K+
+C
Sbjct: 2 LLQFRKMIK-KMTGKEPVVSYAFYGCYCGSGGRGKPKDATDRCCFVHDCCY---EKVTG
+C 57
Query: 62 KVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNCDRNAAICF 106
+P ++Y+YS N I C + + C+ +C CD+ AAICF
Sbjct: 58 -----DPKWDDYTYSWKNGTIVCGGD-DPCKKEVCECDKAAAICF 96
i have now done this :
#!/usr/bin/perl
open FILE, "/home/guest/align.txt";
my @arr123;
local $/ = '';
@arr123 = <FILE>; #output file.
print "<pre>";
#print @arr123;
$array=join('',@arr123);
@arr=split(/\n/,$array);
foreach $a(@arr){
if($a=~/>/ || $a=~ /Length/ || $a=~ /Score/ || $a=~ /Identities/){
+push (@header,$a);} #header information.
if($a=~/^Query/||$a=~/ /||$a=~/^Sbjct/){push(@lin,$a);}
+
}
foreach(@header){print $_,"\n";}
foreach(@lin){print $_,"\n";}
i m sure there are better ways to do it.....and also tell me which is the best website to learn perl?