in reply to Re: Iteration speed
in thread Iteration speed
Though this is a reply to meetraz, it involves having read the next 10 or so replies too.
I'm now posting an example of the file data that I use, and the pseudo-algorithm
But before I do so, I want to say that probably the best reply of the lot came from 'toma' I will truly go away and study computational geomtry now, thanks toma!
File:
The co-ordinates are indeed cartesian, the last three columns are meaningless.ATOM 1 N GLY A 1 43.202 54.570 86.432 1.00 44.15 + N ATOM 2 CA GLY A 1 44.109 54.807 85.249 1.00 45.94 + C ATOM 3 C GLY A 1 44.984 56.034 85.443 1.00 43.20 + C ATOM 4 O GLY A 1 45.070 56.527 86.578 1.00 46.54 + O ATOM 5 N SER A 2 45.617 56.538 84.378 1.00 38.14 + N ATOM 6 CA SER A 2 46.461 57.710 84.544 1.00 33.00 + C ATOM 7 C SER A 2 46.057 59.017 83.842 1.00 29.38 + C ATOM 8 O SER A 2 46.522 60.071 84.270 1.00 32.21 + O ATOM 9 CB SER A 2 47.972 57.377 84.387 1.00 29.10 + C ATOM 10 OG SER A 2 48.326 56.931 83.084 1.00 25.44 + O ATOM 11 N HIS A 3 45.172 59.000 82.838 1.00 24.08 + N ATOM 12 CA HIS A 3 44.778 60.271 82.173 1.00 20.59 + C ATOM 13 C HIS A 3 43.299 60.425 81.787 1.00 18.87 + C ATOM 14 O HIS A 3 42.664 59.463 81.375 1.00 16.14 + O ATOM 15 CB HIS A 3 45.597 60.514 80.897 1.00 17.13 + C ATOM 16 CG HIS A 3 47.060 60.648 81.145 1.00 20.02 + C ATOM 17 ND1 HIS A 3 47.596 61.677 81.887 1.00 20.09 + N ATOM 18 CD2 HIS A 3 48.099 59.855 80.797 1.00 21.10 + C ATOM 19 CE1 HIS A 3 48.904 61.516 81.989 1.00 19.86 + C ATOM 20 NE2 HIS A 3 49.233 60.417 81.333 1.00 18.86 + N
The other important columns are the second through to the sixth columns, in order:
I should add that each line is for an atom and not for a residue, though I was originally talking about residue iteration, I use residue iteration to try and avoid any extra atomic iteration, but in reality, the number of lines is about 10-20 times the number of residues themselves, depending on the residue type
The pseudo-code, which I'l post below, will ignore the format listed above, because each line is for one atom, and I have a coded PDB::Atom object (home-made) that takes the line as it's read, and parses it.
For the sake of making the psuedo-code concise, it is indeed 'PSEUDO' so please dont expect it to run at all!
The function 'addToMemory' simply fills the different hashes I use for lookup, especially to recall all the atoms in a residue. It is in there also, that the atomic data is added to the database, so there is one DB call per line in the file, this is one bottleneck, but much less of a bottleneck than the sheer number of iterations themselves
I try to cut down on the iterations, by pre-calculating the 3D center of the residue, and then comparing the distance of a residue pair to a hard-coded cut-off (varies depending on residues themselves) in 'notClose' (not shown). This avoids having to iterate through all the atoms in a residue, if there is no chance of a bond.
Finally the bond detection itself is in another function, not shown, and doesn't necessarily return a bond, there is more calculations depending on the nature of the atom itself. I just wanted to show the nature of the iterations themselves.
It should be noted that there is a large amount of processing for particular residues and atoms due to possible errors in the file, which I've excluded.
open (FILE, "< ".$file ) or die "\ncouldn't open FILE: ".$file.": $! +"; my $i=0; while(<FILE>){ if($_ =~ /^ATOM/){ $temp = new PDB::Atom('-line' => $_, '-number'=>$i); addToMemory($self, $temp, $i); ########bond detection ################# ###iterate through chains### foreach my $ch (sort {$a <=> $b} keys %{$self->{'contacts'}){ next if $temp->chain eq $ch; #skip same chains ###iterate through residues### foreach my $r (sort {$a <=> $b} keys %{$self->{'contacts'}{$ch +}}){ next if notClose($temp,self->{'contacts'}{$ch}{$r}); foreach my $a (sort {$a <=> $b} keys %{$self->{'residues'} +{$ch.$r}}){ $bonds->newBond($temp,$self->{'all'}{$ +a}); } } } } } close(FILE); sub addToMemory{ my ($self, $atom, $numb) = @_; $self->{'all'}{$numb}=$atom; unless($atom->proton){ $self->{'atoms'}{$numb} = 1; }else{ $self->{'protons'}{$numb} = 1; } $self->{'contacts'}{$atom->chain}{$atom->resNumber}=residualAverag +e($atom, $self->{'contacts'}{$atom->chain}{$atom->resNumber}); $self->{'residues'}{$ch.$atom->resNumber}{$numb}=1; }
Edited by Chady -- converted pre to code tags.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Iteration speed
by BrowserUk (Patriarch) on Jun 16, 2004 at 18:32 UTC | |
|
Re^2: Iteration speed
by seaver (Pilgrim) on Jun 16, 2004 at 19:35 UTC | |
by BrowserUk (Patriarch) on Jun 16, 2004 at 20:35 UTC |