in reply to Re: Optimizing PDB data structures
in thread Optimizing PDB data structures

Mark

Thanks for your reply, I apologise for not explaining further. $ref is simply the atom number in the PDB file, which is a unique number for every atom.

Hence, your solution would work just as well, as every line has in itself, the chain id and residue details

However, PDB files can be large, for example, the one Im dealing with right now has 30 models, of two chains with thousands of atoms, so thats 50k $ref to step through, and in many case, Im just stepping through the models, chains, or residues.

Being able to choose one chain directly would half the number of atoms to step through.

Any more suggestions?

Thanks
Sam Seaver

Replies are listed 'Best First'.
Re: Re: Re: Optimizing PDB data structures
by kvale (Monsignor) on Sep 23, 2003 at 17:17 UTC
    Ah, I see your goal now. I have two answers to your problem.

    The first is to simply ignore this possible speed optimization. If you are picking one of two chains, use

    foreach my $ref (keys %$self){ next unless $self->{$ref}{'chain'} = 1; # process chain 1 atoms }
    The cost of looping and one nested dereference is probably negligible compared with the other processing you need to do, so don't waste your time on it until you have verified that this is a bottleneck and that the slowdown matters to you.

    If the bottleneck is a real problem, you will have to promote the variables you will subset on and create a more ugly data structure:

    foreach my $ref (keys %$data){ next unless $data->{$ref}->type eq 'ATOM'; my $atom = $data->{$ref}->atom; $self->{$atom->chainId}{$ref}{atoms} = $atom; $self->{$atom->chainId}{$ref}{'residues'} = $atom->resNumber; } # ... foreach my $ref (keys %{$self->{1}}) { # process chain 1 atoms }
    With an extra dereference per atom, I am not convinced that this will be noticably faster.

    -Mark