Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file

Nastazia has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone, I am trying to write a script in perl which will do the following

it will read a pdb file that contains only Ca atoms as the following

1         2   3   4  5  6   
ATOM      1  CA  PRO A 889      84.370  72.820  26.830  1.00  0.00    
+        
ATOM      2  CA  THR A 890      87.370  73.900  28.080  1.00  0.00    
+        
ATOM      3  CA  VAL A 891      90.920  72.490  27.750  1.00  0.00    
+        
ATOM      4  CA  PHE A 892      93.640  74.890  28.970  1.00  0.00    
+        
ATOM      5  CA  HIS B 893      97.060  74.200  27.360  1.00  0.00    
+        
ATOM      6  CA  LYS B 894      99.880  73.920  29.990  1.00  0.00
[download]

it will read a second pdb that contains every atom

1         2  3    4  5  6 
     
ATOM      1  N   PRO A 889      16.220  12.185   1.804  1.00 71.54    
+       N  
ATOM      2  CA  PRO A 889      16.101  12.990   3.034  1.00 70.89    
+       C  
ATOM      3  C   PRO A 889      15.432  14.346   2.803  1.00 72.31    
+       C  
ATOM      4  O   PRO A 889      14.743  14.852   3.703  1.00 72.20    
+       O  
ATOM      5  CB  PRO A 889      17.553  13.151   3.502  1.00 72.96    
+       C  
ATOM      6  CG  PRO A 889      18.315  12.067   2.782  1.00 78.00    
+       C  
ATOM      7  CD  PRO A 889      17.626  11.907   1.465  1.00 73.35    
+       C
[download]

(The files refer to the same molecule but have different number of lines)

So if the residue number (column num 5) is the same it will take the chain letter (column num 4) from the first file and replace all the chain letters that have the same residue number in the second file. So far i've got this disaster :/

print "\nEnter the network pdb file file: ";
$inputFile = <STDIN>;
chomp $inputFile;

unless (open(INPUTFILE, $inputFile)) {
    print "Cannot read from '$inputFile'";
    <STDIN>;
    exit;
}
# load the file into an array
chomp(@networkpdb = <INPUTFILE>);

# close the file
close(INPUTFILE);

print "\nEnter the pdb output file: ";
$inputFile2 = <STDIN>;
chomp $inputFile2;

unless (open(INPUTFILE, $inputFile2)) {
    print "Cannot read from '$inputFile2'";
    <STDIN>;
    exit;
}

chomp(@pdb = <INPUTFILE>);


close(INPUTFILE);


            
for ($line1 = 0; $line1 < scalar @networkpdb; $line1++) {
    if ($networkpdb[$line1] =~ m/ATOM\s+\d+\s+\w+\s+\w{3}\s*(\w+)\s*(\
+d*)\s+\S+\.\S+\s+\S+\.\S+\s+\S+\.\S+\s+.+\..+\..*/ig) {
               my  $resnum=$2;
               my  $chain=$1;
for ($line = 0; $line < scalar @pdb; $line++) {
     if ($pdb[$line]=~ m/(ATOM\s+\d+\s+\w+\s+\w{3}\s*)(\w+)\s*(\d*)(\s
++\S+\.\S+\s+\S+\.\S+\s+\S+\.\S+\s+.+\..+\..*)/ig) {
                my $begining=$1;
                my  $resnum1=$3;
                my  $chain1=$2;
                my $end=$4;
 if ($resnum1=$resnum)
                 {$chain1=$chain;
 $parsedData{$line} = $begining.$chain1."\s".$resnum1.$end;
        
    
}}}}}

# create the output file name
$outputFile = "WithNetwork_".$inputFile;

# open the output file
open (OUTFILE, ">$outputFile");
# print the data lines
foreach $line (sort {$a <=> $b} keys %parsedData) {
    print OUTFILE $parsedData{$line}."\n";
}

# close the output file
close (OUTFILE);
[download]

thank you very much in advance

Comment on Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file Select or Download Code

Replies are listed 'Best First'.
Re: Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file by hippo (Archbishop) on Jun 22, 2018 at 10:18 UTC
`if ($resnum1=$resnum)` Inside the brackets is an assignment. You almost certainly don't want to do that but instead test equality. ie: `if ($resnum1 == $resnum)` [download] `==` is for comparing numbers and `eq` is for comparing strings. Is there any particular reason you use those massive regexes in preference to a simple split? That might make things a little clearer. Other tips: use strict and warnings, replace `print ... exit` with die and try to use consistent indenting to make your code more legible (this really does help). Good luck.	[reply] [d/l] [select]
Re: Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file by Laurent_R (Canon) on Jun 22, 2018 at 12:09 UTC
In addition to hippo's comments, please note that nested loops are likely to crucify performances if your files are even moderately large. You should probably store the values of interest of the first file into a hash and then lookup the hash when reading the second file. This will be faster and easier to implement.	[reply]
Re: Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file by talexb (Chancellor) on Jun 22, 2018 at 14:25 UTC
Another approach would be to dump the information into two database tables, and have the database do the heavy lifting. Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply]