comment on

Dear all,

Though this is a reply to meetraz, it involves having read the next 10 or so replies too.

I'm now posting an example of the file data that I use, and the pseudo-algorithm

But before I do so, I want to say that probably the best reply of the lot came from 'toma' I will truly go away and study computational geomtry now, thanks toma!

File:

ATOM      1  N   GLY A   1      43.202  54.570  86.432  1.00 44.15    
+       N
ATOM      2  CA  GLY A   1      44.109  54.807  85.249  1.00 45.94    
+       C
ATOM      3  C   GLY A   1      44.984  56.034  85.443  1.00 43.20    
+       C
ATOM      4  O   GLY A   1      45.070  56.527  86.578  1.00 46.54    
+       O
ATOM      5  N   SER A   2      45.617  56.538  84.378  1.00 38.14    
+       N
ATOM      6  CA  SER A   2      46.461  57.710  84.544  1.00 33.00    
+       C
ATOM      7  C   SER A   2      46.057  59.017  83.842  1.00 29.38    
+       C
ATOM      8  O   SER A   2      46.522  60.071  84.270  1.00 32.21    
+       O
ATOM      9  CB  SER A   2      47.972  57.377  84.387  1.00 29.10    
+       C
ATOM     10  OG  SER A   2      48.326  56.931  83.084  1.00 25.44    
+       O
ATOM     11  N   HIS A   3      45.172  59.000  82.838  1.00 24.08    
+       N
ATOM     12  CA  HIS A   3      44.778  60.271  82.173  1.00 20.59    
+       C
ATOM     13  C   HIS A   3      43.299  60.425  81.787  1.00 18.87    
+       C
ATOM     14  O   HIS A   3      42.664  59.463  81.375  1.00 16.14    
+       O
ATOM     15  CB  HIS A   3      45.597  60.514  80.897  1.00 17.13    
+       C
ATOM     16  CG  HIS A   3      47.060  60.648  81.145  1.00 20.02    
+       C
ATOM     17  ND1 HIS A   3      47.596  61.677  81.887  1.00 20.09    
+       N
ATOM     18  CD2 HIS A   3      48.099  59.855  80.797  1.00 21.10    
+       C
ATOM     19  CE1 HIS A   3      48.904  61.516  81.989  1.00 19.86    
+       C
ATOM     20  NE2 HIS A   3      49.233  60.417  81.333  1.00 18.86    
+       N
[download]

The co-ordinates are indeed cartesian, the last three columns are meaningless.

The other important columns are the second through to the sixth columns, in order:

atom number
atom name
residue name
chain id
residue number

I should add that each line is for an atom and not for a residue, though I was originally talking about residue iteration, I use residue iteration to try and avoid any extra atomic iteration, but in reality, the number of lines is about 10-20 times the number of residues themselves, depending on the residue type

The pseudo-code, which I'l post below, will ignore the format listed above, because each line is for one atom, and I have a coded PDB::Atom object (home-made) that takes the line as it's read, and parses it.

For the sake of making the psuedo-code concise, it is indeed 'PSEUDO' so please dont expect it to run at all!

The function 'addToMemory' simply fills the different hashes I use for lookup, especially to recall all the atoms in a residue. It is in there also, that the atomic data is added to the database, so there is one DB call per line in the file, this is one bottleneck, but much less of a bottleneck than the sheer number of iterations themselves

I try to cut down on the iterations, by pre-calculating the 3D center of the residue, and then comparing the distance of a residue pair to a hard-coded cut-off (varies depending on residues themselves) in 'notClose' (not shown). This avoids having to iterate through all the atoms in a residue, if there is no chance of a bond.

Finally the bond detection itself is in another function, not shown, and doesn't necessarily return a bond, there is more calculations depending on the nature of the atom itself. I just wanted to show the nature of the iterations themselves.

It should be noted that there is a large amount of processing for particular residues and atoms due to possible errors in the file, which I've excluded.

 open (FILE, "< ".$file ) or die "\ncouldn't open FILE: ".$file.": $! 
+";
    my $i=0;
    while(<FILE>){
    if($_ =~ /^ATOM/){
        $temp = new PDB::Atom('-line' => $_, '-number'=>$i);
        addToMemory($self, $temp, $i);

        ########bond detection ################# 
            ###iterate through chains###
        foreach my $ch (sort {$a <=> $b} keys %{$self->{'contacts'}){
            next if $temp->chain eq $ch; #skip same chains

            ###iterate through residues###
        foreach my $r (sort {$a <=> $b} keys %{$self->{'contacts'}{$ch
+}}){
            next if notClose($temp,self->{'contacts'}{$ch}{$r});
            foreach my $a (sort {$a <=> $b} keys %{$self->{'residues'}
+{$ch.$r}}){
                                $bonds->newBond($temp,$self->{'all'}{$
+a});
            }
            }
        }
        }
    }
    close(FILE);

sub addToMemory{
    my ($self, $atom, $numb) = @_;

    $self->{'all'}{$numb}=$atom;
    unless($atom->proton){
    $self->{'atoms'}{$numb} = 1;
    }else{
    $self->{'protons'}{$numb} = 1;
    }
    
    $self->{'contacts'}{$atom->chain}{$atom->resNumber}=residualAverag
+e($atom, $self->{'contacts'}{$atom->chain}{$atom->resNumber});
    $self->{'residues'}{$ch.$atom->resNumber}{$numb}=1;
}
[download]

Edited by Chady -- converted pre to code tags.

In reply to Re: Iteration speed by seaver
in thread Iteration speed by seaver

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.