in reply to Iteration speed
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Iteration speed
by seaver (Pilgrim) on Jun 16, 2004 at 14:32 UTC | |
Though this is a reply to meetraz, it involves having read the next 10 or so replies too. I'm now posting an example of the file data that I use, and the pseudo-algorithm But before I do so, I want to say that probably the best reply of the lot came from 'toma' I will truly go away and study computational geomtry now, thanks toma! File: The co-ordinates are indeed cartesian, the last three columns are meaningless. The other important columns are the second through to the sixth columns, in order:
I should add that each line is for an atom and not for a residue, though I was originally talking about residue iteration, I use residue iteration to try and avoid any extra atomic iteration, but in reality, the number of lines is about 10-20 times the number of residues themselves, depending on the residue type The pseudo-code, which I'l post below, will ignore the format listed above, because each line is for one atom, and I have a coded PDB::Atom object (home-made) that takes the line as it's read, and parses it. For the sake of making the psuedo-code concise, it is indeed 'PSEUDO' so please dont expect it to run at all! The function 'addToMemory' simply fills the different hashes I use for lookup, especially to recall all the atoms in a residue. It is in there also, that the atomic data is added to the database, so there is one DB call per line in the file, this is one bottleneck, but much less of a bottleneck than the sheer number of iterations themselves I try to cut down on the iterations, by pre-calculating the 3D center of the residue, and then comparing the distance of a residue pair to a hard-coded cut-off (varies depending on residues themselves) in 'notClose' (not shown). This avoids having to iterate through all the atoms in a residue, if there is no chance of a bond. Finally the bond detection itself is in another function, not shown, and doesn't necessarily return a bond, there is more calculations depending on the nature of the atom itself. I just wanted to show the nature of the iterations themselves. It should be noted that there is a large amount of processing for particular residues and atoms due to possible errors in the file, which I've excluded.
Edited by Chady -- converted pre to code tags. | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Jun 16, 2004 at 18:32 UTC | |
Sorry, but your pseudo code is just a little to pseudo. Without a clear understanding of the internals of the PDB::Atom objects and it's methods, I find it impossible to understand what is going on. One casual comment though. In general, perl's objects carry a considerable performance penalty relative to it's standard hashes and array's. Iterating large volumes of data represented as objects will be much slower than processing that same data stored in a hash or an array. This is no surprise, nor a critisism of Perl. Just s statement of fact. The extra levels of indirection and lookup are bound to carry a penalty, but when dealing with large volumes where speed is a criteria, it is best avoided. | [reply] |
by seaver (Pilgrim) on Jun 16, 2004 at 19:35 UTC | |
dear all I've started a 'profile' on one of my biggest files (15 chains!) and here's the most telling results:
%Time Sec. #calls sec/call F name 17.79 1043.2695 10727535 0.000097 PDB::Bonds::notClose 14.88 872.6432 14509596 0.000060 PDB::Writer::numberFormat 10.52 616.7291 14509596 0.000043 PDB::Bond::dist 6.89 403.8085 14509597 0.000028 PDB::Writer::pad_left 6.40 375.5808 1 375.580791 ? HighPDB::parse 6.39 374.5854 14509596 0.000026 PDB::Writer::pad_right 5.54 325.1066 43586508 0.000007 UNIVERSAL::isa 4.89 286.5572 1881489 0.000152 PDB::Bond::new 3.49 204.5796 18291657 0.000011 PDB::Atom::x 3.42 200.8238 18291657 0.000011 PDB::Atom::y 3.23 189.6586 18291657 0.000010 PDB::Atom::z 3.14 184.3266 1881489 0.000098 PDB::Bond::isHydphb 3.14 184.2096 1880381 0.000098 PDB::Bond::isElcsta 2.24 131.0808 1 131.080769 WhatIf::doWhatif 1.91 111.7546 10730691 0.000010 PDB::Atom::resNameThe code for the first three subroutines are shown here:
I had totally forgotten that I use numberFormat to manipulate the result of the sqrt function. (This is essential for the DB) I'm now going to move this to the DB part, so that it only gets called when adding 'real' bonds to the DB. I'm also going to remove the UNIVERSAL::isa calls, and just try to ASSUME $self whenever I can. Thanks for all the help, and I'm still investigating mr. Delauney.
Cheers | [reply] [d/l] |
by BrowserUk (Patriarch) on Jun 16, 2004 at 20:35 UTC | |
FWIW. Here is the code I mentioned earlier. It just completed a run looking for pairs of atoms that are within .01 units of each other. The input was 1,000,000 atom coordinates randomly generated in file that looks like this
It found 308,822 pairs within the requisite distance of one another (from the 1,000,000,000,000 possibles) in just under 7 hours on 2Ghz machine. Whether the technique is adaptable to your application I'm not sure. Read more... (3 kB)
| [reply] [d/l] [select] |