Re: Optimizing PDB data structures

You dont tell us what $ref is, so I'll assume that is is some primary key in a flat database. From the PDB format spec , each atom has one chainID and one resSeq, which I guess you are calling resNumber.

So I'd rejig the data strucure as

foreach my $ref (keys %$data){
   if($data->{$ref}->type eq 'ATOM'){
      my $atom = $data->{$ref}->atom;
      $self->{$ref}{atoms} = $atom;
      $self->{$ref}{'residues'} = $atom->resNumber;
      $self->{$ref}{'chain'} = $atom->chainId;
   }
}
[download]

Using this, one can still step through chainIDs and resNumbers by extracting all $ref keys.

-Mark

Comment on Re: Optimizing PDB data structures Download Code

Replies are listed 'Best First'.
Re: Re: Optimizing PDB data structures by seaver (Pilgrim) on Sep 23, 2003 at 13:15 UTC
Mark Thanks for your reply, I apologise for not explaining further. $ref is simply the atom number in the PDB file, which is a unique number for every atom. Hence, your solution would work just as well, as every line has in itself, the chain id and residue details However, PDB files can be large, for example, the one Im dealing with right now has 30 models, of two chains with thousands of atoms, so thats 50k $ref to step through, and in many case, Im just stepping through the models, chains, or residues. Being able to choose one chain directly would half the number of atoms to step through. Any more suggestions? Thanks Sam Seaver	[reply]
Re: Re: Re: Optimizing PDB data structures by kvale (Monsignor) on Sep 23, 2003 at 17:17 UTC
Ah, I see your goal now. I have two answers to your problem. The first is to simply ignore this possible speed optimization. If you are picking one of two chains, use `foreach my $ref (keys %$self){ next unless $self->{$ref}{'chain'} = 1; # process chain 1 atoms }` [download] The cost of looping and one nested dereference is probably negligible compared with the other processing you need to do, so don't waste your time on it until you have verified that this is a bottleneck and that the slowdown matters to you. If the bottleneck is a real problem, you will have to promote the variables you will subset on and create a more ugly data structure: `foreach my $ref (keys %$data){ next unless $data->{$ref}->type eq 'ATOM'; my $atom = $data->{$ref}->atom; $self->{$atom->chainId}{$ref}{atoms} = $atom; $self->{$atom->chainId}{$ref}{'residues'} = $atom->resNumber; } # ... foreach my $ref (keys %{$self->{1}}) { # process chain 1 atoms }` [download] With an extra dereference per atom, I am not convinced that this will be noticably faster. -Mark	[reply] [d/l] [select]