nastaziales has asked for the wisdom of the Perl Monks concerning the following question:

Hello, i am new to Perl and i m trying to write a code that will accept a pdb file, it will extract all the informations (atom number, atom type,residue name, residue number, x, y, z, b factor) and it will rearrange the residue number and it shoud save the new pdb in a new archive, i can't find a way to use a loop with a string array, this is the code:
print "\nEnter the input file: "; $inputFile = <STDIN>; chomp $inputFile; unless (open(INPUTFILE, $inputFile)) { print "Cannot read from '$inputFile'.\nProgram closing.\n"; <STDIN>; exit;} chomp(@dataArray = <INPUTFILE>); close(INPUTFILE); for ($line = 0; $line <= scalar @dataArray; $line++) { if ($dataArray[$line] =~ m/ATOM\s+(\d+)\s+(\w+)\s+(\w{3})\s+(\w)+\ +s+(\d+)\s+(\S+\.\S+)\s+(\S+\.\S+)\s+(\S+\.\S+)\s+(.+\S)(.\d\d+\.\d\d. ++)/ig) { $m1=$1; $m2=$2; $m3=$3; $m5=$5; $m6=$6; $m7=$7; $m8=$8; $m9=$9; $m10=$10; push(@m3,$m3); push(@m5,$m5); foreach $line (@m3,@m5) {if ($m3[$line] eq $m3[$line+1]) {$m5[$line]=$m5[$line+1];} elsif ($m3[$line] ne $m3[$line+1]){$m5[$line+1]=$m5[$line]+1;}} $~="PDBFORMAT"; format PDBFORMAT = ATOM @|||| @||| @|| @||| @|||||| @|||||| @|||||| @>>>>> @>>>>> $m1, $m2, $m3,$m5, $m6, $m7, $m8, $m9, $m10 . open(PDBFORMAT,">>my2pdb.txt") or die "Can't open anything"; write PDBFORMAT;}} close PDBFORMAT;
  • Comment on Rearrange the residue number of a pdb file according to the residues names
  • Download Code

Replies are listed 'Best First'.
Re: Rearrange the residue number of a pdb file according to the residues names
by QM (Parson) on Jun 18, 2015 at 12:26 UTC
    On the face of it, your problem is easily solved with Perl. You just need a little help with the jargon.

    use strict; use warnings;

    ...would show you that $m51 is not defined on line 39. This is probably a typo -- please use real code :D.

    It is not clear what the script should be doing, other than rewriting all the ATOM lines, without field 4.

    This snippet (reformatted for "standard" readability):

    foreach $line (@m3,@m5){ if ($m3[$line] eq $m3[$line+1]) { $m5[i]=$m5[i+1]; } elsif ($m3[$line] ne $m3[$line+1]) { $m5[i+1]=$m5[i]+1; } }
    ...tries to index @m3 by using one of its values. According to your regex, $m3 is \w{3}, so it will probably evaluate to index 0. Also, using "i" as an index should not compile.

    Please show us your sample script, and a sample input and output (what you got, and what you really want). We can then walk you through this.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      ATOM 310 1HH2 ARG A 607 -31.278 29.882 25.723 1.00 99.99 ATOM 311 2HH2 ARG A 607 -32.344 30.932 24.851 1.00 99.99 ATOM 312 N LEU A 608 -36.327 31.914 18.187 1.00 65.62 ATOM 313 CA LEU A 608 -37.435 32.634 17.559 1.00 67.47 ATOM 314 C LEU A 608 -38.434 33.052 18.624 1.00 74.29 ATOM 315 O LEU A 608 -38.982 32.201 19.331 1.00 71.12 ATOM 316 CB LEU A 608 -38.110 31.803 16.459 1.00 64.64 ATOM 317 CG LEU A 608 -39.261 32.481 15.719 1.00 71.07 ATOM 318 CD1 LEU A 608 -38.782 33.704 14.929 1.00 73.68 ATOM 319 CD2 LEU A 608 -39.981 31.498 14.829 1.00 69.63 ATOM 320 H LEU A 608 -36.638 31.041 18.563 1.00 99.99 ATOM 321 N ARG A 565 -38.634 34.587 18.911 1.00 22.27 ATOM 322 CA ARG A 565 -39.655 35.200 19.766 1.00 23.04 ATOM 323 C ARG A 565 -40.963 35.104 19.007 1.00 22.72 ATOM 324 O ARG A 565 -41.046 35.500 17.847 1.00 24.21 ATOM 325 CB ARG A 565 -39.275 36.643 20.105 1.00 99.99 ATOM 326 CG ARG A 565 -38.044 36.770 20.986 1.00 99.99
      This is an input example, the 4th column is the residue name and the 6th the residue number, in the input there is a discontinuity in the residue number, when the residue name changes the residue number should be the previous residue number plus 1, so i need a script that will take a pdb file with discontinuities and will give a pdb file with the same format but with a continuous res number
        I need a script that will take a pdb file with discontinuities and will give a pdb file with the same format but with a continuous res number

        I'm assuming here that every time the residue name changes, the residue number is incremented. And in fact, that later on, the same residue name may occur again (with some intervening lines with different residue names), but should have a different number.

        I would rework your original script to something like this.

        #!/usr/bin/env perl # Read a PDB file and change the residue numbers to be continuous. use strict; use warnings; print "\nEnter the PDB input file: "; my $inputFile = <STDIN>; chomp $inputFile; unless (open(INPUTFILE, "<", $inputFile)) { die "Cannot read from <$inputFile>, $!"; } my $output_file = "my2pdb.txt"; open(PDBOUT,">>my2pdb.txt") or die "Cannot open <$output_file> for writing, $!"; my $last_residue_name = ''; my $last_residue_number = 0; while (<INPUTFILE>) { if (m/^ATOM/) { my @m = split; # Increment the residue number if the residue name changes if ($m[3] eq $last_residue_name) { $m[5] = $last_residue_number; } else { $m[5] = ++$last_residue_number; $last_residue_name = $m[3]; } printf PDBOUT "%4s %5s %4s %3s %1s %4s %7s %7s %7s %6s %6s +\n", @m; } } exit;

        Here's the input file I used:

        ATOM 310 1HH2 ARG A 607 -31.278 29.882 25.723 1.00 99.99 ATOM 311 2HH2 ARG A 607 -32.344 30.932 24.851 1.00 99.99 ATOM 312 N LEU A 608 -36.327 31.914 18.187 1.00 65.62 ATOM 313 CA LEU A 608 -37.435 32.634 17.559 1.00 67.47 ATOM 314 C LEU A 608 -38.434 33.052 18.624 1.00 74.29 ATOM 315 O LEU A 608 -38.982 32.201 19.331 1.00 71.12 ATOM 316 CB LEU A 608 -38.110 31.803 16.459 1.00 64.64 ATOM 317 CG LEU A 608 -39.261 32.481 15.719 1.00 71.07 ATOM 318 CD1 LEU A 608 -38.782 33.704 14.929 1.00 73.68 ATOM 319 CD2 LEU A 608 -39.981 31.498 14.829 1.00 69.63 ATOM 320 H LEU A 608 -36.638 31.041 18.563 1.00 99.99 ATOM 321 N ARG A 565 -38.634 34.587 18.911 1.00 22.27 ATOM 322 CA ARG A 565 -39.655 35.200 19.766 1.00 23.04 ATOM 323 C ARG A 565 -40.963 35.104 19.007 1.00 22.72 ATOM 324 O ARG A 565 -41.046 35.500 17.847 1.00 24.21 ATOM 325 CB ARG A 565 -39.275 36.643 20.105 1.00 99.99 ATOM 326 CG ARG A 565 -38.044 36.770 20.986 1.00 99.99

        And here's the output I got (note that column 5 is included, unlike your initial example):

        ATOM 310 1HH2 ARG A 1 -31.278 29.882 25.723 1.00 99.99 ATOM 311 2HH2 ARG A 1 -32.344 30.932 24.851 1.00 99.99 ATOM 312 N LEU A 2 -36.327 31.914 18.187 1.00 65.62 ATOM 313 CA LEU A 2 -37.435 32.634 17.559 1.00 67.47 ATOM 314 C LEU A 2 -38.434 33.052 18.624 1.00 74.29 ATOM 315 O LEU A 2 -38.982 32.201 19.331 1.00 71.12 ATOM 316 CB LEU A 2 -38.110 31.803 16.459 1.00 64.64 ATOM 317 CG LEU A 2 -39.261 32.481 15.719 1.00 71.07 ATOM 318 CD1 LEU A 2 -38.782 33.704 14.929 1.00 73.68 ATOM 319 CD2 LEU A 2 -39.981 31.498 14.829 1.00 69.63 ATOM 320 H LEU A 2 -36.638 31.041 18.563 1.00 99.99 ATOM 321 N ARG A 3 -38.634 34.587 18.911 1.00 22.27 ATOM 322 CA ARG A 3 -39.655 35.200 19.766 1.00 23.04 ATOM 323 C ARG A 3 -40.963 35.104 19.007 1.00 22.72 ATOM 324 O ARG A 3 -41.046 35.500 17.847 1.00 24.21 ATOM 325 CB ARG A 3 -39.275 36.643 20.105 1.00 99.99 ATOM 326 CG ARG A 3 -38.044 36.770 20.986 1.00 99.99

        (I don't do formats, you'll have to figure that out for yourself.)

        Now that still begs the question of whether you wanted the entire input file written to the output, or just the ATOM section.

        I'm assuming I've missed several hints about the final output, but this is the basic structure.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: Rearrange the residue number of a pdb file according to the residues names
by erix (Prior) on Jun 18, 2015 at 12:07 UTC

    PDB file format description might be useful. (if not for the OP then for those wanting to know more)

Re: Rearrange the residue number of a pdb file according to the residues names
by pme (Monsignor) on Jun 18, 2015 at 11:36 UTC
    Hi nastaziales,

    Welcome to the monastery! Could you explain what to do with $m3 and $m5?

      an example of a pdb file:
      ATOM 1 N LEU A 579 -44.254 33.292 -5.352 1.00 92.25 ATOM 2 CA LEU A 579 -43.296 32.939 -4.304 1.00 88.51 ATOM 3 C LEU A 579 -43.865 33.305 -2.916 1.00 94.93 ATOM 4 O LEU A 579 -44.226 34.459 -2.677 1.00 97.79 ATOM 5 CB LEU A 579 -41.945 33.627 -4.567 1.00 88.70 ATOM 6 CG LEU A 579 -40.668 32.828 -4.289 1.00 88.54 ATOM 7 CD1 LEU A 579 -40.298 31.909 -5.470 1.00 87.89 ATOM 8 CD2 LEU A 579 -39.515 33.759 -4.003 1.00 90.97 ATOM 9 1H LEU A 579 -45.070 33.694 -4.937 1.00 99.99 ATOM 10 2H LEU A 579 -43.920 33.952 -6.025 1.00 99.99 ATOM 11 3H LEU A 579 -44.567 32.526 -5.914 1.00 99.99 ATOM 12 N SER A 580 -43.966 32.299 -2.017 1.00 89.98 ATOM 13 CA SER A 580 -44.572 32.440 -0.687 1.00 90.11 ATOM 14 C SER A 580 -43.866 31.761 0.494 1.00 88.81 ATOM 15 O SER A 580 -43.564 30.563 0.472 1.00 84.43 ATOM 16 CB SER A 580 -46.046 32.049 -0.724 1.00 96.32 ATOM 17 OG SER A 580 -46.293 31.048 -1.698 1.00107.02 ATOM 18 H SER A 580 -43.643 31.364 -2.161 1.00 99.99
      the 4rth column is the residue name and the 6th the residue number, i need a script that will rearrange the residue number according to the residue name (when the previous residue name of the column is the same with the next one then the residue number is the same too, when it changes the residue number should have a +1)
        Can you give an example input and desired output?

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

        m3 is the residue name and m5 the residue number