Hello everyone, I am trying to write a script in perl which will do the following

it will read a pdb file that contains only Ca atoms as the following

1 2 3 4 5 6 ATOM 1 CA PRO A 889 84.370 72.820 26.830 1.00 0.00 + ATOM 2 CA THR A 890 87.370 73.900 28.080 1.00 0.00 + ATOM 3 CA VAL A 891 90.920 72.490 27.750 1.00 0.00 + ATOM 4 CA PHE A 892 93.640 74.890 28.970 1.00 0.00 + ATOM 5 CA HIS B 893 97.060 74.200 27.360 1.00 0.00 + ATOM 6 CA LYS B 894 99.880 73.920 29.990 1.00 0.00

it will read a second pdb that contains every atom

1 2 3 4 5 6 ATOM 1 N PRO A 889 16.220 12.185 1.804 1.00 71.54 + N ATOM 2 CA PRO A 889 16.101 12.990 3.034 1.00 70.89 + C ATOM 3 C PRO A 889 15.432 14.346 2.803 1.00 72.31 + C ATOM 4 O PRO A 889 14.743 14.852 3.703 1.00 72.20 + O ATOM 5 CB PRO A 889 17.553 13.151 3.502 1.00 72.96 + C ATOM 6 CG PRO A 889 18.315 12.067 2.782 1.00 78.00 + C ATOM 7 CD PRO A 889 17.626 11.907 1.465 1.00 73.35 + C

(The files refer to the same molecule but have different number of lines)

So if the residue number (column num 5) is the same it will take the chain letter (column num 4) from the first file and replace all the chain letters that have the same residue number in the second file. So far i've got this disaster :/

print "\nEnter the network pdb file file: "; $inputFile = <STDIN>; chomp $inputFile; unless (open(INPUTFILE, $inputFile)) { print "Cannot read from '$inputFile'"; <STDIN>; exit; } # load the file into an array chomp(@networkpdb = <INPUTFILE>); # close the file close(INPUTFILE); print "\nEnter the pdb output file: "; $inputFile2 = <STDIN>; chomp $inputFile2; unless (open(INPUTFILE, $inputFile2)) { print "Cannot read from '$inputFile2'"; <STDIN>; exit; } chomp(@pdb = <INPUTFILE>); close(INPUTFILE); for ($line1 = 0; $line1 < scalar @networkpdb; $line1++) { if ($networkpdb[$line1] =~ m/ATOM\s+\d+\s+\w+\s+\w{3}\s*(\w+)\s*(\ +d*)\s+\S+\.\S+\s+\S+\.\S+\s+\S+\.\S+\s+.+\..+\..*/ig) { my $resnum=$2; my $chain=$1; for ($line = 0; $line < scalar @pdb; $line++) { if ($pdb[$line]=~ m/(ATOM\s+\d+\s+\w+\s+\w{3}\s*)(\w+)\s*(\d*)(\s ++\S+\.\S+\s+\S+\.\S+\s+\S+\.\S+\s+.+\..+\..*)/ig) { my $begining=$1; my $resnum1=$3; my $chain1=$2; my $end=$4; if ($resnum1=$resnum) {$chain1=$chain; $parsedData{$line} = $begining.$chain1."\s".$resnum1.$end; }}}}} # create the output file name $outputFile = "WithNetwork_".$inputFile; # open the output file open (OUTFILE, ">$outputFile"); # print the data lines foreach $line (sort {$a <=> $b} keys %parsedData) { print OUTFILE $parsedData{$line}."\n"; } # close the output file close (OUTFILE);

thank you very much in advance


In reply to Perl script that will read two pdb files with different line numbers and will replace the chain letter from the first to the second file by Nastazia

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.