I do apologise, I've spent so much time thinking biochemically, it's all i can speak sometime!
For starters, Im going to rewrite the tyrosine list I posted in my original reply, but Im going to format it according to the columns so that it can be seen better, and ive added a little ascii art to show the residue in '3D'
1234
N
CA
C
O
CB
CG
CD1
CD2
CE1
CE2
CZ
OH
H
2HB
3HB
HD1
HD2
HE1
HE2
HH
HD1 HE1
| |
1HB CD1-CE1
H-N | / \
CA-CB-CG CZ-OH-HH
C | \ /
O 2HB CD2-CE2
| |
HD2 HE2
Im only dealing with single letter elements here and those elements are:
C O N H
so in column two, you would only find those letters.
The main chain is the ONLY part that has single letters for the name: C, O & N However, in ADDITION, CA is in the main chain, it's the first atom of the side chain so it's 'alpha'.
All other names are AT LEAST two letters, indicating the element, and its distance.
If there are more than one name that has the SAME NAME, and its not 'H', then a quantifier is added at the end of the name to distinguish between them: 1,2 and so on.
If it's 'H', then even with the quantifier, there can be more than one 'H' with the same name, so 'H' names have an extra quantifier if needed, which is added at the beginning of the name.
Thanks for the code though, you're right in saying that the devil is in the details, and from what you coded I can 'fine-tune' to get the result I'd like.
I'd already started to code my own way, I'd seperated the atoms up into main chain, side chain, protons. The actual length of the numbers used can be sorted using the default 'cmp', so for the main chain, I only used one number, 1-9.
Then for the side chain I used double digits 51-59
Then for the protons I used triple digits:
#########sort atoms into three groups#########
foreach my $a (@names){
push @main, $self->{'all'}{$a} if $self->{'all'}{$a}->mainHeavy;
push @heavy, $self->{'all'}{$a} if $self->{'all'}{$a}->sideHeavy;
push @proton, $self->{'all'}{$a} if $self->{'all'}{$a}->proton;
$names{$self->{'all'}{$a}->atomName}=$a;
}
##########do main chains, single digits############
foreach my $a (@main){
if($a->atomName eq 'N'){
$main{1}=$a;
}elsif($a->atomName eq 'CA'){
$main{2}=$a;
}elsif($a->atomName eq 'C'){
$main{3}=$a;
}elsif($a->atomName eq 'O'){
$main{4}=$a;
}
}
#########Do side chain################
my %heavyweights = ( 'C' => 1,
'N' => 2,
'O' => 3);
my %greekweights = ( 'A' => 1,
'B' => 2,
'G' => 3,
'D' => 4,
'E' => 5,
'Z' => 6,
'H' => 7 );
foreach my $a (@heavy){
$main{$heavyweights{$a->atomEl}.$greekweights{$a->atomRemote}} = $
+a;
}
####Do protons############
#I actually used 4 digits, because of the proton quanitifier #itself
foreach my $a (@proton){
print $a->atomName;
my $n = '9';
$n .= $greekweights{$a->atomRemote} if $a->atomRemote;
$n .= 0 unless $a->atomRemote;
$n .= $a->atomBranch if $a->atomBranch;
$n .= 0 unless $a->atomBranch;
$n .= $a->hydNumber if $a->hydNumber;
$n .= 0 unless $a->hydNumber;
$main{$n} = $a;
}
foreach my $i ( sort{ $a<=>$b } keys %main ){
print $main{$i}->atomName." $i\n";
push @result, $main{$i};
}
__END__
N 1
CA 2
C 3
O 4
CB 12
CG 13
CD 14
NE2 25
OE1 35
H 9000
HA 9100
HB 9202
HB 9203
HG 9302
HG 9303
HE2 9521
HE2 9522
Mine own needed fine tuning, but I posted the code here because you could use the same method of numerical lengths to do a sort using 'cmp', am I right?
But cheers mate!
Sam |