stellaparallax has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am creating a program that calculates the distance between the x, y, z coordinates of atoms listed in a pdb file. So far i have this :
#!/usr/bin/perl -w $num = 0; $count = 0; while (<>) { # Find x, y, z coordinates and store in separate arrays if ($_ =~ /^ATOM/) { @line = $_ =~ m/^(.....).(.....).(....).(...)..(....)....(.... +....)(........)(........)/; $x = $line[5]; $arrayx[$num] = $x; $y = $line[6]; $arrayy[$num] = $y; $z = $line[7]; $arrayz[$num] = $z; ++$num; } # Count number of atoms if ($_ =~ /^ATOM/) { ++$count; } } # Calculate distance between all atom coordinates foreach $i (0..$count) { foreach $j ($i + 1..$count) { $dist = sqrt( ($arrayx[$i] - $arrayx[$j])**2 + ($arrayy[$i] - $arrayy[$j])**2 + ($arrayz[$i] - $arrayz[$j])**2 ); print "$dist\n"; } }
When I run the program i get this message popping up for some of the lines and I don't know what to do to fix it: "Use of uninitialized value in subtraction (-) at ./gas.pl line 42, <> line 14368" The line that it states is the last line of the pdb file, however i don't see why this line is involved in my calculations as this is not present in any of my arrays. The pdb file I'm using is 3PBL.pdb (sorry wasn't able to attach or post link but easy to find if u put that name into google). Any help would be much appreciated as I am VERY new to Perl. Thanks

Replies are listed 'Best First'.
Re: Calc distance between atoms in pdb file
by BrowserUk (Patriarch) on Apr 29, 2012 at 13:19 UTC

    Here's a somewhat simpler and more idiomatic version of your code that does the same thing that may help you. Ask about anything you do not understand:

    #!/usr/bin/perl -w use strict; my( @arrayx, @arrayy, @arrayz ); while (<>) { # Find x, y, z coordinates and store in separate arrays if ($_ =~ /^ATOM/) { my @line = $_ =~ m/^(.....).(.....).(....).(...)..(....)....(. +.......)(........)(........)/; ## using push mean you don't have to count because ... push @arrayx, $line[5]; push @arrayy, $line[6]; push @arrayz, $line[7]; } } close *ARGV; ## prevent confusing error message suffixes # Calculate distance between all atom coordinates ## ... $#xxx gives you the highest index in array @xxx foreach my $i ( 0 .. $#arrayx ) { foreach my $j ( $i + 1 .. $#arrayx ) { my $dist = sqrt( ($arrayx[$i] - $arrayx[$j])**2 + ($arrayy[$i] - $arrayy[$j])**2 + ($arrayz[$i] - $arrayz[$j])**2 ); ## Adding $i and $j to your output will let you know what that + output is. print "$i <> $j : $dist\n"; } }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I'm getting an error of uSE OF UNINITIALIZED VALUE IN SUBTRACTION AT LINE 47
      foreach my $i ( 0 .. 1520) { foreach my $j ( $i + 1 .. 1520 ) { my $dist = sqrt( ($arrayx[$i] - $arrayx[$j])**2 + ($arrayy[$i] - $arrayy[$j])**2 + ($arrayz[$i] - $arrayz[$j])**2 ); ## Adding $i and $j to your output will let you know what that print "$i <> $j : $dist\n"; } } exit 0;
        Where do the 1520 come from? Obviously not from the originally posted code.
      I'm Still getting Use of uninitialized value in subtraction
Re: Calc distance between atoms in pdb file
by BrowserUk (Patriarch) on Apr 29, 2012 at 13:10 UTC
    "Use of uninitialized value in subtraction (-) at ./gas.pl line 42, <> line 14368" The line that it states is the last line of the pdb file, however i don't see why this line is involved in my calculations as this is not present in any of my arrays.

    This: <> line 14368" is not relevant to the actual error. It simply mean that is the last line that was read from the last still open file. Perl appends it to the error message because the file is still open, and it might therefore be relevant. In this case, it isn't.

    You can suppress that part of the error messages by close open files when you are done with them. In this case, the open file is the implicit file handle opened by the use of the diamond operator (while( <> ). To close it, follow the loop with:

    # Count number of atoms if ($_ =~ /^ATOM/) { ++$count; } } close *ARGV; ## Add this.

    The reason why you are getting the "uninitialised value" warnings is because your for loops are running off the end of your arrays.

    The variable $count counts the number of elements in the arrays, but the indexes run from 0; so the last index will be one less than the number of elements!

    Ie. An array that contains the 10 values 1 .. 10, will have indexes 0 .. 9:

    @a = 1 .. 10; $a[0] = 1; $a[1] = 2; $a[2] = 3; ... $a[8] = 9; $a[9] = 10;

    So, to prevent the error detected by the warnings, your for loops should run from 0 to $count - 1. Ie:

    # Calculate distance between all atom coordinates foreach $i ( 0 .. $count-1 ) { foreach $j ( $i + 1 .. $count-1 ) { $dist = sqrt( ($arrayx[$i] - $arrayx[$j])**2 + ($arrayy[$i] - $arrayy[$j])**2 + ($arrayz[$i] - $arrayz[$j])**2 ); print "$dist\n"; } }

    That should get you going. There are several other changes that would make your life easier, but I'll leave that for other posts.

    One thing that intrigues me though. How did @line get transmuted into (at)line in your post?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: Calc distance between atoms in pdb file
by toolic (Bishop) on Apr 29, 2012 at 12:51 UTC
    Since I can not easily reproduce your problem, my best guess is that you either have an off-by-one error in your nested foreach loops or your arrays don't have as many elements as you think they do.

    Before your foreach loops, you can check the number of elements in the arrays:

    print scalar(@arrayx), "\n"; print scalar(@arrayy), "\n"; print scalar(@arrayz), "\n";

    If that doesn't solve it, add print statements inside your foreach loops.

    Another good practice is to check if your regex matches:

    if (@line = $_ =~ m/^(.....).(.....).(....).(...)..(....)....(........ +)(........)(........)/) { $x = $line[5]; $arrayx[$num] = $x; # more code }

    See also:

      Thanks so everyone for taking the time to help a newbie out :). Makes sense now and it works yay!

        Can you provide me the full code please?

Re: Calc distance between atoms in pdb file
by brx (Pilgrim) on Apr 29, 2012 at 18:53 UTC

    Post some real input data: answers will be better, with a better regex/method.

    Reading each line, you can calculate distance with all previous points - no need to make a double loop after reading all the file.


    Update: strike first part (sorry OP)
      Post some real input data: answers will be better,

      I found the real input data via the OPs reference.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?