in reply to Optimizing PDB data structures

You dont tell us what $ref is, so I'll assume that is is some primary key in a flat database. From the PDB format spec , each atom has one chainID and one resSeq, which I guess you are calling resNumber.

So I'd rejig the data strucure as

foreach my $ref (keys %$data){ if($data->{$ref}->type eq 'ATOM'){ my $atom = $data->{$ref}->atom; $self->{$ref}{atoms} = $atom; $self->{$ref}{'residues'} = $atom->resNumber; $self->{$ref}{'chain'} = $atom->chainId; } }
Using this, one can still step through chainIDs and resNumbers by extracting all $ref keys.

-Mark

Replies are listed 'Best First'.
Re: Re: Optimizing PDB data structures
by seaver (Pilgrim) on Sep 23, 2003 at 13:15 UTC
    Mark

    Thanks for your reply, I apologise for not explaining further. $ref is simply the atom number in the PDB file, which is a unique number for every atom.

    Hence, your solution would work just as well, as every line has in itself, the chain id and residue details

    However, PDB files can be large, for example, the one Im dealing with right now has 30 models, of two chains with thousands of atoms, so thats 50k $ref to step through, and in many case, Im just stepping through the models, chains, or residues.

    Being able to choose one chain directly would half the number of atoms to step through.

    Any more suggestions?

    Thanks
    Sam Seaver

      Ah, I see your goal now. I have two answers to your problem.

      The first is to simply ignore this possible speed optimization. If you are picking one of two chains, use

      foreach my $ref (keys %$self){ next unless $self->{$ref}{'chain'} = 1; # process chain 1 atoms }
      The cost of looping and one nested dereference is probably negligible compared with the other processing you need to do, so don't waste your time on it until you have verified that this is a bottleneck and that the slowdown matters to you.

      If the bottleneck is a real problem, you will have to promote the variables you will subset on and create a more ugly data structure:

      foreach my $ref (keys %$data){ next unless $data->{$ref}->type eq 'ATOM'; my $atom = $data->{$ref}->atom; $self->{$atom->chainId}{$ref}{atoms} = $atom; $self->{$atom->chainId}{$ref}{'residues'} = $atom->resNumber; } # ... foreach my $ref (keys %{$self->{1}}) { # process chain 1 atoms }
      With an extra dereference per atom, I am not convinced that this will be noticably faster.

      -Mark