in reply to Mean calculation problem

If you're only interested in processing the ATOM records, then this may be good enough:

#! perl -slw use strict; use List::Util qw[ sum ]; my %stats; while( <DATA> ) { next unless m[^ATOM]; my @fields = unpack 'a6 a5 a3 a3 a4 x2 a3 x4 a8 a8 a7 a7 a6 a12', +$_; # print join'|', @fields; push @{ $stats{ $fields[ 4 ] } }, $fields[ 10 ]; } printf "%3s : %.3f%%\n", $_, sum( @{ $stats{ $_ } } ) / @{ $stats{ $_ } } for sort keys %stats; =output C:\test>junk2 GLN : 35.962% LEU : 25.250% LYS : 64.087% =cut __DATA__ 1234567890123456789012345678901234567890123456789012345678901234567890 +1234567890 ATOM 1 N LYS A 2 -10.231 0.078 18.128 1.00 28.52 + N ATOM 2 CA LYS A 2 -9.472 -0.327 19.297 1.00 28.52 + C ATOM 3 C LYS A 2 -10.308 -0.875 20.459 1.00 28.52 + C ATOM 4 O LYS A 2 -10.180 -2.054 20.801 1.00 28.52 + O ATOM 5 CB LYS A 2 -8.592 0.833 19.765 1.00 92.54 + C ATOM 6 CG LYS A 2 -7.346 0.373 20.498 1.00 92.54 + C ATOM 7 CD LYS A 2 -6.207 1.375 20.382 1.00 92.54 + C ATOM 8 CE LYS A 2 -4.929 0.810 21.002 1.00 92.54 + C ATOM 9 NZ LYS A 2 -3.756 1.724 20.882 1.00 92.54 + N ATOM 10 N GLN A 3 -11.162 -0.038 21.053 1.00 28.24 + N ATOM 11 CA GLN A 3 -11.998 -0.465 22.181 1.00 28.24 + C ATOM 12 C GLN A 3 -12.749 -1.750 21.849 1.00 28.24 + C ATOM 13 O GLN A 3 -12.882 -2.639 22.694 1.00 28.24 + O ATOM 14 CB GLN A 3 -13.019 0.618 22.556 1.00 42.14 + C ATOM 15 CG GLN A 3 -12.450 2.018 22.718 1.00 42.14 + C ATOM 16 CD GLN A 3 -12.541 2.849 21.437 1.00 42.14 + C ATOM 17 OE1 GLN A 3 -12.042 2.453 20.377 1.00 42.14 + O ATOM 18 NE2 GLN A 3 -13.180 4.012 21.536 1.00 42.14 + N ATOM 1 N LEU A 62 32.071 22.080 63.175 1.00 25.25 + N ATOM 2 CA LEU A 62 32.171 21.102 62.056 1.00 25.25 + C ATOM 3 C LEU A 62 31.367 19.843 62.260 1.00 25.25 + C ATOM 4 O LEU A 62 31.676 18.842 61.650 1.00 25.25 + O ATOM 5 CB LEU A 62 33.640 20.698 61.795 1.00 25.25 + C ATOM 6 CG LEU A 62 34.589 21.606 60.965 1.00 25.25 + C ATOM 7 CD1 LEU A 62 35.877 20.829 60.616 1.00 25.25 + C ATOM 8 CD2 LEU A 62 33.903 22.137 59.675 1.00 25.25 + C ATOM 9 N LEU A 63 30.358 19.879 63.124 1.00 25.25 + N ATOM 10 CA LEU A 63 29.497 18.715 63.396 1.00 25.25 + C ATOM 11 C LEU A 63 28.036 19.160 63.458 1.00 25.25 + C ATOM 12 O LEU A 63 27.574 19.700 64.489 1.00 25.25 + O ATOM 13 CB LEU A 63 29.869 17.995 64.717 1.00 25.25 + C ATOM 14 CG LEU A 63 29.125 16.690 65.118 1.00 25.25 + C ATOM 15 CD1 LEU A 63 29.725 15.515 64.383 1.00 25.25 + C ATOM 16 CD2 LEU A 63 29.208 16.398 66.606 1.00 25.25 + C

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

Replies are listed 'Best First'.
Re^2: Mean calculation problem
by hellworld (Novice) on Jan 13, 2010 at 13:28 UTC
    Thanks for spending time to write the code, helped a lot. But is there a way for me to apply this to a big chunk of data ? I mean, the aminoacid groups go as long as 1,2,...............264 and there can be 2600something ATOM lines depending on the PDB file.

      Sure. If you supply this version:

      #! perl -slw use strict; use List::Util qw[ sum ]; BEGIN{ @ARGV = map glob( $_ ), @ARGV } my %stats; while( <> ) { next unless m[^ATOM]; my @fields = unpack 'a6 a5 a3 a3 a4 x2 a3 x4 a8 a8 a7 a7 a6 a12', +$_; # print join'|', @fields; push @{ $stats{ $fields[ 4 ] } }, $fields[ 10 ]; } printf "%3s : %.3f%%\n", $_, sum( @{ $stats{ $_ } } ) / @{ $stats{ $_ } } for sort keys %stats;

      with a wildcard path/filename on the command line,

      theScript.pl /path/to/*.pdb

      it will perform the stats across all the matching files.

      Alternatively, if you want the stats on a file-by-file basis, then supply the filenames one at a time:

      ## For windows; something similar is possible on *nix for %i in (\path\to\*.pdb) do @theScript.pl \path\to\%i

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.