Re: checking for overlap

Thanks for the extra info. Let's talk rather about making the code more readable and maintainable, instead of "optimizing" it. (or maybe that's what you mean by "optimize"?)

First off, your problem with $rest1[$i] might be due to this:

$i=0;
while ($line=<SOLVFILE>)
{
    chop($line);  # use "chomp" here, not "chop"
    ...
}
[download]

If you're on a system that uses "\r\n" for line termination (i.e. windows), "chop" will only take away the "\n", leaving "\r", whereas "chomp" will take away both. (I'm not really sure that this will fix the problem, but you should use chomp anyway.)

BTW, what value do you see when you print "Number of atoms in solvent file is $Nsolv \n" ?

As for general improvements, many people prefer to feed file names to the script by putting them on the command line with the name of the script:

  my_script.pl  file1  file2
[download]

Then use @ARGV (or just "shift") to do things with the names within the script. Also, when reading from files a line at a time, using $_ instead of a named variable really helps keep it simple. Another thing is that you seem to be declaring variables and assigning file data to them, but never using the data, so you should try to reduce that clutter.

Personally, I would prefer to structure the input data to cut down on the duplicated code -- your two input files seem to be organized the same way, so you could read both files into a single set of hashes, keyed by file name; each hash element would be an array of values to be used from the file (atom names, coords, radii). This means you can loop over the file names, and the code that reads and structures the data is needed in only on place in the script.

(Oh, as I worked out the concept for myself, it looked to me like you have an error in the "sqrt" call -- you use the "z" values twice, and don't use the "y" values at all. Restructuring the code and data storage to reduce the amount of typing will help to avoid such mistakes.)

Finally, proper indentation really does help. Here's one approach -- it parses properly, but I don't have data to test it on:

#!/usr/bin/perl -w

use strict; # painful at first, maybe, but worth it

# using command line args is good (and easy):

die "Usage: $0 solvent.file structure.file\n" unless
    ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] );

open( PDB, ">Solvent.pdb" ) or die "...:$!\n";
open( TXT, ">OverL.txt" ) or die "...:$!\n";

my %input;
my %atmnb;   # declare storage near where you need it
my %coord;
my %natms;

for my $file ( @ARGV ) {
    open( IN, $file ) or die "open failed on $file: $!\n";

    while (<IN>) { # use a slice on split to get just the needed data:
        my ($atb,$x,$y,$z,$rad) = (split)[1,5,6,7,9];
        push @{$input{$file}}, $_;  # keep full input line
        push @{$atmnb{$file}}, $atb;
        push @{$coord{$file}}, [ $x, $y, $z, $rad ];
    }
    close IN;
    print "Number of atoms in $file is $.\n";
    $natms{$file} = $.;
}

my ($f1,$f2) = @ARGV;

for my $i ( 0 .. $natms{$f1}-1 ) {
    for my $j ( 0 .. $natms{$f2}-1 ) {
# use map to compute the x,y,z diffs just once each:
        my @difs = map { $coord{$f1}[$i][$_] - $coord{$f2}[$j][$_] } (
+ 0, 1, 2 );
        my $dist = sqrt( $difs[0] * $difs[0] +
                         $difs[1] * $difs[1] +
                         $difs[2] * $difs[2] );
        my $rsum = $coord{$f1}[$i][3] + $coord{$f2}[$j][3];
        if ( $dist < $rsum ) {
            print TXT "Overlap for $atmnb{$f1}[$i] and $atmnb{$f2}[$j]
+\n";
        }
        else {
            print PDB "$input{$f1}[$i]";
        }
    }
}
[download]

update: added the [$i] [$j] indexes when using the $coord array in the computation/reporting loop. (Shouldn't have forgotten that...)

Another update: regarding optimization, I don't know how much it would save, but in that N1 * N2 nested loop for measuring the distances, you could extract the "f1" data into simpler storage, and save the work of having to dereference the "f1" data structure on every iteration of the inner loop -- ie:

for my $i ( 0 .. $natms{$f1}-1 ) {

    my $xyz = @{$coord{$f1}[$i]};
    my $rad = pop @xyz;

    for my $j ( 0 .. $natms{$f2}-1 ) {
# use map to compute the x,y,z diffs just once each:
        my @difs = map { $xyz[$_] - $coord{$f2}[$j][$_] } ( 0, 1, 2 );
        my $dist = sqrt( $difs[0] * $difs[0] +
                         $difs[1] * $difs[1] +
                         $difs[2] * $difs[2] );
        my $rsum = $rad + $coord{$f2}[$j][3];
        if ( $dist < $rsum ) {
            print TXT "Overlap for $atmnb{$f1}[$i] and $atmnb{$f2}[$j]
+\n";
        }
        else {
            print PDB $input{$f1}[$i];
        }
    }
}
[download]

Comment on Re: checking for overlap Select or Download Code