in reply to Re: Re: Re: sorting according to greek alphabet in roman letters
in thread sorting according to greek alphabet in roman letters

I do apologise, I've spent so much time thinking biochemically, it's all i can speak sometime!

For starters, Im going to rewrite the tyrosine list I posted in my original reply, but Im going to format it according to the columns so that it can be seen better, and ive added a little ascii art to show the residue in '3D'

1234 N CA C O CB CG CD1 CD2 CE1 CE2 CZ OH H 2HB 3HB HD1 HD2 HE1 HE2 HH HD1 HE1 | | 1HB CD1-CE1 H-N | / \ CA-CB-CG CZ-OH-HH C | \ / O 2HB CD2-CE2 | | HD2 HE2
Im only dealing with single letter elements here and those elements are:

C O N H

so in column two, you would only find those letters.

The main chain is the ONLY part that has single letters for the name: C, O & N However, in ADDITION, CA is in the main chain, it's the first atom of the side chain so it's 'alpha'.

All other names are AT LEAST two letters, indicating the element, and its distance.

If there are more than one name that has the SAME NAME, and its not 'H', then a quantifier is added at the end of the name to distinguish between them: 1,2 and so on.

If it's 'H', then even with the quantifier, there can be more than one 'H' with the same name, so 'H' names have an extra quantifier if needed, which is added at the beginning of the name.

Thanks for the code though, you're right in saying that the devil is in the details, and from what you coded I can 'fine-tune' to get the result I'd like.

I'd already started to code my own way, I'd seperated the atoms up into main chain, side chain, protons. The actual length of the numbers used can be sorted using the default 'cmp', so for the main chain, I only used one number, 1-9. Then for the side chain I used double digits 51-59 Then for the protons I used triple digits:

#########sort atoms into three groups######### foreach my $a (@names){ push @main, $self->{'all'}{$a} if $self->{'all'}{$a}->mainHeavy; push @heavy, $self->{'all'}{$a} if $self->{'all'}{$a}->sideHeavy; push @proton, $self->{'all'}{$a} if $self->{'all'}{$a}->proton; $names{$self->{'all'}{$a}->atomName}=$a; } ##########do main chains, single digits############ foreach my $a (@main){ if($a->atomName eq 'N'){ $main{1}=$a; }elsif($a->atomName eq 'CA'){ $main{2}=$a; }elsif($a->atomName eq 'C'){ $main{3}=$a; }elsif($a->atomName eq 'O'){ $main{4}=$a; } } #########Do side chain################ my %heavyweights = ( 'C' => 1, 'N' => 2, 'O' => 3); my %greekweights = ( 'A' => 1, 'B' => 2, 'G' => 3, 'D' => 4, 'E' => 5, 'Z' => 6, 'H' => 7 ); foreach my $a (@heavy){ $main{$heavyweights{$a->atomEl}.$greekweights{$a->atomRemote}} = $ +a; } ####Do protons############ #I actually used 4 digits, because of the proton quanitifier #itself foreach my $a (@proton){ print $a->atomName; my $n = '9'; $n .= $greekweights{$a->atomRemote} if $a->atomRemote; $n .= 0 unless $a->atomRemote; $n .= $a->atomBranch if $a->atomBranch; $n .= 0 unless $a->atomBranch; $n .= $a->hydNumber if $a->hydNumber; $n .= 0 unless $a->hydNumber; $main{$n} = $a; } foreach my $i ( sort{ $a<=>$b } keys %main ){ print $main{$i}->atomName." $i\n"; push @result, $main{$i}; } __END__ N 1 CA 2 C 3 O 4 CB 12 CG 13 CD 14 NE2 25 OE1 35 H 9000 HA 9100 HB 9202 HB 9203 HG 9302 HG 9303 HE2 9521 HE2 9522
Mine own needed fine tuning, but I posted the code here because you could use the same method of numerical lengths to do a sort using 'cmp', am I right?

But cheers mate!
Sam