jcklasseter has asked for the wisdom of the Perl Monks concerning the following question:

So, I'm trying to analyze some data. I have a file that looks like this :
SCF DONE 1.000000000000000 30.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 30.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 30.0000000000000000 Ti O 2 4 Direct 0.6464566666666656 0.6448333333333309 0.6278799999999976 0.6868766666666630 0.6884999999999977 0.7054533333333310 0.6853766666666701 0.6920033333333322 0.6431533333333306 0.7078133333333341 0.7139733333333353 0.7533000000000030 0.6234800000000007 0.6190766666666647 0.5814200000000014 0.6493600000000015 0.6410499999999999 0.6899299999999968 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00
and I'm trying to analyze the distance between the xyz coordinates located between
Direct
and
0.00000000E+00
So far, this is what I have.
#!/usr/bin/perl use strict; use warnings; use diagnostics; my $source = "./CONTCAR"; my $destination = "./OUTPUT"; open(IN, '<', $source) or die "Couldn't open $source: $!\n"; open(OUT, '>', $destination) or die "Couldn't write to $destinatio +n: $!\n"; my @data = map [ split ], grep /\S/, <IN>; foreach (@data) { print "$_\n"; } for my $i (0 .. $#data) { for my $j (0 .. $#data) { next if $i == $j; my @coords_i = @{$data[$i]}[0,1,2]; my @coords_j = @{$data[$j]}[0,1,2]; printf OUT "%s to %s Distance=%.5f\n", $data[$i][0], $data[$j][0], $data[$i][2], $data[$j][2], distance(\@coords_i, \@coords_j); } } sub distance { my ($aa, $bb) = @_; my ($x, $y, $z) = map { $aa->[$_] - $bb->[$_] } 0 .. $#$aa; return sqrt(($x - $x)**2 + ($y - $y)**2 + ($z - $z)**2); } close IN; close OUT; print "Done.\n";
In the output file, this :
to 2 Distance=1.00000 1 to 3 Distance=1.00000 1 to 4 Distance=1.00000 1 to 5 Distance=1.00000 2 to 1 Distance=2.00000 2 to 3 Distance=2.00000 2 to 4 Distance=2.00000 2 to 5 Distance=2.00000 3 to 1 Distance=3.00000 3 to 2 Distance=3.00000 3 to 4 Distance=3.00000 3 to 5 Distance=3.00000 4 to 1 Distance=4.00000 4 to 2 Distance=4.00000 4 to 3 Distance=4.00000 4 to 5 Distance=4.00000 5 to 1 Distance=5.00000 5 to 2 Distance=5.00000 5 to 3 Distance=5.00000 5 to 4 Distance=5.00000
is printed. I'm not quite sure if I'm calling the distance wrong, or if my formula is messed up, or all of the above. Any help is appreciated. EDIT: I know the script I have won't analyze the sample input I gave. I have a test file (./CONTCAR) that is literally just
1 2 3 1 2 3 1 2 3 1 2 3
To try and get the main loop even working. EDIT: Updated info

Replies are listed 'Best First'.
Re: XYZ Manipulation
by toolic (Bishop) on Jun 15, 2015 at 17:14 UTC
    You open the IN filehandle, but you never use it. Perhaps you meant:
    my @data = map [split], grep /\S/, <IN>;

    You open the OUT filehandle, but you never use it. Perhaps you meant:

    printf OUT "%s to %s Distance-%.5f\n", $data[$i][1], $data[$j] +[1], $data[$i][11], $data[$j][11], distance( \@coords_i, \@coords_j ) +;

    I still get warnings. It looks like the 1st time you call "distance", you pass it arrays with undef's in them. See also the Basic debugging checklist.

      Okay. I fixed that, but also get warnings. I'm unsure of the undef issue as well.
      String found where operator expected at ./t2 line 29, near "<OUT> "%s + to %s Distance-%.5f\n"" (#1) (S syntax) The Perl lexer knows whether to expect a term or an ope +rator. If it sees what it knows to be a term when it was expecting to see + an operator, it gives you this warning. Usually it indicates that an operator or delimiter was omitted, such as a semicolon. (Missing operator before "%s to %s Distance-%.5f\n"?) syntax error at ./t2 line 29, near "<OUT> "%s to %s Distance-%.5f\n"" Execution of ./t2 aborted due to compilation errors (#2) (F) Probably means you had a syntax error. Common reasons include +: A keyword is misspelled. A semicolon is missing. A comma is missing. An opening or closing parenthesis is missing. An opening or closing brace is missing. A closing quote is missing. Often there will be another error message associated with the synt +ax error giving more information. (Sometimes it helps to turn on -w. +) The error message itself often tells you where it was in the line +when it decided to give up. Sometimes the actual error is several toke +ns before this, because Perl is good at understanding random input. Occasionally the line number may be misleading, and once in a blue + moon the only way to figure out what's triggering the error is to call perl -c repeatedly, chopping away half the program each time to se +e if the error went away. Sort of the cybernetic version of 20 ques +tions. Uncaught exception from user code: syntax error at ./t2 line 29, near "<OUT> "%s to %s Distance-%.5f +\n"" Execution of ./t2 aborted due to compilation errors.
        My code does not use angle brackets around OUT. Is your code: printf <OUT> ... ? If so, then you incorrectly copied my code.
Re: XYZ Manipulation
by BrowserUk (Patriarch) on Jun 15, 2015 at 17:21 UTC

    Where did you obtain this script?

    The first problem is that you are opening your input file:

    open(IN, '<', $source) or die "Couldn't open $source: $!\n";

    But then trying to read your input from stdin:

    my @data = map [ split ], grep /\S/, <>; ### <> reads from stdin (or +a file named on the command line) You would need <IN> instead.

    That problem explains why the script "gets stuck and doesn't exit.". It is waiting for input from the command which you aren't supplying.

    The bunch of error messages you get when you added the print statement suggest that you forgot to add the ';' at the end of the line.

    Try replacing <> with <IN> in this line:

    my @data = map [ split ], grep /\S/, <>;
    And then either remove the print statement you added; or add the semicolon on the end; then come back to us with your progress.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      I obtained parts of it from
      http://stackoverflow.com/questions/20741484/calculating-distance-betwe +en-atoms-by-atomic-coordinates?rq=1
      . <> Replaced. I got error messages with  print ....., as well as  print......, ; as well as their printf deviations. Updated original post with corrections. The getting stuck part makes sense, however. Thanks.

        Hello jck, as I look at the code on the link and the code on this node, I see discrepancies. At some point, you can't go back and "fix" the original post, as it appears you're trying to reconcile 2 distance functions which are similar in that they both measure distance, but dissimilar in that they have different variables and indices. It's nothing that can't be ironed out, but at some point, you're gonna have to choose what you're gonna go with, so that we can all be on the same page. Let's take a look at the main control of your (updated?) original post:

        for my $i (0 .. $#data) { for my $j (0 .. $#data) { next if $i == $j; my @coords_i = @{$data[$i]}[1,2,3]; my @coords_j = @{$data[$j]}[1,2,3]; print OUT "%s to %s Distance-%.5f\n", ####( ; )?? $data[$i][1], $data[$j][1], $data[$i][11], $data[$j][11], distance(\@coords_i, \@coords_j); } }

        We have an explicit double loop, which will allow the ordered enumeration you desire to occur, but I can see that it goes over the entire array twice, which doesn't gybe with the fields at the link you posted. When I run your program, I get this to confirm my suspicion:

        C:\Users\Fred\Desktop\pm>perl dis1.pl Argument "CA" isn't numeric in subtraction (-) at dis1.pl line 38, <IN +> line 4 ( #1) (W numeric) The indicated string was fed as an argument to an oper +ator that expected a numeric value instead. If you're fortunate the me +ssage will identify which operator was so unfortunate. Argument "N" isn't numeric in subtraction (-) at dis1.pl line 38, <IN> + line 4 (# 1) Argument "MSE" isn't numeric in subtraction (-) at dis1.pl line 38, <I +N> line 4 (#1) Argument "C" isn't numeric in subtraction (-) at dis1.pl line 38, <IN> + line 4 (# 1) Argument "O" isn't numeric in subtraction (-) at dis1.pl line 38, <IN> + line 4 (# 1) Done.

        That's perl saying "you can't subtract strings." You need to adjust your indices. If you showed your actual input, actual script, and actual output between code tags on perlmonks, then I'm pretty sure that your troubles are ephemeral. Other than that, you're not all that far from what you want because the output is well-formed:

        %s to %s Distance-%.5f 12NC1%s to %s Distance-%.5f 13NC2%s to %s Distance-%.5f 14NO3%s to %s Distance-%.5f 21CN1%s to %s Distance-%.5f 23CC1%s to %s Distance-%.5f 24CO2%s to %s Distance-%.5f 31CN2%s to %s Distance-%.5f 32CC1%s to %s Distance-%.5f 34CO1%s to %s Distance-%.5f 41ON3%s to %s Distance-%.5f 42OC2%s to %s Distance-%.5f 43OC1

        Best of luck and many happy adventures with perl.

Re: XYZ Manipulation
by RichardK (Parson) on Jun 15, 2015 at 17:18 UTC

    You're not doing anything with IN & OUT so you're not actually reading your input file.

    Why not start simply with a while loop, and ignore map and grep until you get your algorithm working.

    while (my $line = <IN>) { chomp $line; next unless $line; # do stuff with each line here }

    BTW what did you think "printf $destination ..." was going to do?

      Well, I had assumed it would print the output to the specified file?
Re: XYZ Manipulation
by salva (Canon) on Jun 16, 2015 at 08:15 UTC
    If you plan to do calculations using vectors, you should consider using some CPAN module as Math::Vector::Real. For instance:
    use Math::Vector::Real; my @data = map V(split), grep /\S/, <IN>; for my $i (0 .. $#data) { my $u = $data[$i] for my $j (0 .. $#data) { next if $i == $j; my $v = $data[$j]; my $dist = $u->dist($v); print OUT "%s to %s Distance-%.5f\n", $u, $v, $dist; } }

    PDL is another interesting tool for numeric intensive computations. It has a somewhat steep learning curve, but in the end it is quite productive.

Re: XYZ Manipulation
by segaknight (Initiate) on Jun 16, 2015 at 16:55 UTC

    I'm sorry code works this is my first HTML post, this line is printing an extra + but I have to go home from work now

    #!/usr/bin/perl @coords_i=();@coords_j=(); open (IN,"CONTCAR"); open (OUT,">OUTPUT"); $k=$go=$i=0; while($line=<IN>){ $lines[$i]=$line; $i++; } for ($j=0;$j<$i;$j++){ $go++ if $lines[$j]=~/Direct/; $go=0 if $lines[$j]=~/0.00000000E/; if ($go>0){ if(($x,$y,$z)=$lines[$j]=~/^\s*([-+]?\d+\.\d+)\s+([-+]?\d+\.\d+)\s+([- ++]?\d+\.\d+)/){ $coords_i[$k]=sprintf("%.5f:%.5f:%.5f",$x,$y,$z); $coords_j[$k]=sprintf("%.5f:%.5f:%.5f",$x,$y,$z); $k++; } } } $n=1; foreach $atom (@coords_i){ print OUT "ATOM$n: $atom\n"; foreach $atom1 (@coords_j){ unless($atom eq $atom1){ ($x,$y,$z) = split(/:/,$atom); ($x1,$y1,$z1) = split(/:/,$atom1); &distance; } } $n++; } sub distance { $a1=($x1-$x);$b1=($y1-$y);$c1=($z1-$z); $r1=sqrt(($a1*$a1)+($b1*$b1)+($c1*$c1)); $distance=sprintf("%.5f",$r1); print OUT "$atom to $atom1\nDistance $distance\n"; } close (OUT); print STDERR "Done.\n";
Re: XYZ Manipulation
by segaknight (Initiate) on Jun 16, 2015 at 16:30 UTC
    This works but for CONTCAR example above which provides no atomnames so plenty of room to change the output formatting.
    #!/usr/bin/perl @coords_i=();@coords_j=(); open (IN,"CONTCAR"); open (OUT,">OUTPUT"); $k=$go=$i=0; while($line=<IN>){ $lines[$i]=$line; $i++; } for ($j=0;$j<$i;$j++){ $go++ if $lines[$j]=~/Direct/; $go=0 if $lines[$j]=~/0.00000000E/; if ($go>0){ if(($x,$y,$z)=$lines[$j]=~/^\s*([-+]?\d+\.\d+)\s+([-+]?\d+\.\d+)\s+([- ++]?\d+\.\d+)/){ $coords_i[$k]=sprintf("%.5f:%.5f:%.5f",$x,$y,$z); $coords_j[$k]=sprintf("%.5f:%.5f:%.5f",$x,$y,$z); $k++; } } } $n=1; foreach $atom (@coords_i){ print OUT "ATOM$n: $atom\n"; foreach $atom1 (@coords_j){ unless($atom eq $atom1){ ($x,$y,$z) = split(/:/,$atom); ($x1,$y1,$z1) = split(/:/,$atom1); &distance; } } $n++; } sub distance { $a1=($x1-$x);$b1=($y1-$y);$c1=($z1-$z); $r1=sqrt(($a1*$a1)+($b1*$b1)+($c1*$c1)); $distance=sprintf("%.5f",$r1); print OUT "$atom to $atom1\nDistance $distance\n"; } close (OUT); print STDERR "Done.\n";