Dear all,

Though this is a reply to meetraz, it involves having read the next 10 or so replies too.

I'm now posting an example of the file data that I use, and the pseudo-algorithm

But before I do so, I want to say that probably the best reply of the lot came from 'toma' I will truly go away and study computational geomtry now, thanks toma!

File:

ATOM 1 N GLY A 1 43.202 54.570 86.432 1.00 44.15 + N ATOM 2 CA GLY A 1 44.109 54.807 85.249 1.00 45.94 + C ATOM 3 C GLY A 1 44.984 56.034 85.443 1.00 43.20 + C ATOM 4 O GLY A 1 45.070 56.527 86.578 1.00 46.54 + O ATOM 5 N SER A 2 45.617 56.538 84.378 1.00 38.14 + N ATOM 6 CA SER A 2 46.461 57.710 84.544 1.00 33.00 + C ATOM 7 C SER A 2 46.057 59.017 83.842 1.00 29.38 + C ATOM 8 O SER A 2 46.522 60.071 84.270 1.00 32.21 + O ATOM 9 CB SER A 2 47.972 57.377 84.387 1.00 29.10 + C ATOM 10 OG SER A 2 48.326 56.931 83.084 1.00 25.44 + O ATOM 11 N HIS A 3 45.172 59.000 82.838 1.00 24.08 + N ATOM 12 CA HIS A 3 44.778 60.271 82.173 1.00 20.59 + C ATOM 13 C HIS A 3 43.299 60.425 81.787 1.00 18.87 + C ATOM 14 O HIS A 3 42.664 59.463 81.375 1.00 16.14 + O ATOM 15 CB HIS A 3 45.597 60.514 80.897 1.00 17.13 + C ATOM 16 CG HIS A 3 47.060 60.648 81.145 1.00 20.02 + C ATOM 17 ND1 HIS A 3 47.596 61.677 81.887 1.00 20.09 + N ATOM 18 CD2 HIS A 3 48.099 59.855 80.797 1.00 21.10 + C ATOM 19 CE1 HIS A 3 48.904 61.516 81.989 1.00 19.86 + C ATOM 20 NE2 HIS A 3 49.233 60.417 81.333 1.00 18.86 + N
The co-ordinates are indeed cartesian, the last three columns are meaningless.

The other important columns are the second through to the sixth columns, in order:

  1. atom number
  2. atom name
  3. residue name
  4. chain id
  5. residue number

I should add that each line is for an atom and not for a residue, though I was originally talking about residue iteration, I use residue iteration to try and avoid any extra atomic iteration, but in reality, the number of lines is about 10-20 times the number of residues themselves, depending on the residue type

The pseudo-code, which I'l post below, will ignore the format listed above, because each line is for one atom, and I have a coded PDB::Atom object (home-made) that takes the line as it's read, and parses it.

For the sake of making the psuedo-code concise, it is indeed 'PSEUDO' so please dont expect it to run at all!

The function 'addToMemory' simply fills the different hashes I use for lookup, especially to recall all the atoms in a residue. It is in there also, that the atomic data is added to the database, so there is one DB call per line in the file, this is one bottleneck, but much less of a bottleneck than the sheer number of iterations themselves

I try to cut down on the iterations, by pre-calculating the 3D center of the residue, and then comparing the distance of a residue pair to a hard-coded cut-off (varies depending on residues themselves) in 'notClose' (not shown). This avoids having to iterate through all the atoms in a residue, if there is no chance of a bond.

Finally the bond detection itself is in another function, not shown, and doesn't necessarily return a bond, there is more calculations depending on the nature of the atom itself. I just wanted to show the nature of the iterations themselves.

It should be noted that there is a large amount of processing for particular residues and atoms due to possible errors in the file, which I've excluded.

open (FILE, "< ".$file ) or die "\ncouldn't open FILE: ".$file.": $! +"; my $i=0; while(<FILE>){ if($_ =~ /^ATOM/){ $temp = new PDB::Atom('-line' => $_, '-number'=>$i); addToMemory($self, $temp, $i); ########bond detection ################# ###iterate through chains### foreach my $ch (sort {$a <=> $b} keys %{$self->{'contacts'}){ next if $temp->chain eq $ch; #skip same chains ###iterate through residues### foreach my $r (sort {$a <=> $b} keys %{$self->{'contacts'}{$ch +}}){ next if notClose($temp,self->{'contacts'}{$ch}{$r}); foreach my $a (sort {$a <=> $b} keys %{$self->{'residues'} +{$ch.$r}}){ $bonds->newBond($temp,$self->{'all'}{$ +a}); } } } } } close(FILE); sub addToMemory{ my ($self, $atom, $numb) = @_; $self->{'all'}{$numb}=$atom; unless($atom->proton){ $self->{'atoms'}{$numb} = 1; }else{ $self->{'protons'}{$numb} = 1; } $self->{'contacts'}{$atom->chain}{$atom->resNumber}=residualAverag +e($atom, $self->{'contacts'}{$atom->chain}{$atom->resNumber}); $self->{'residues'}{$ch.$atom->resNumber}{$numb}=1; }

Edited by Chady -- converted pre to code tags.


In reply to Re: Iteration speed by seaver
in thread Iteration speed by seaver

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.