aria has asked for the wisdom of the Perl Monks concerning the following question:

Hi all! I'm new to Perl and I'm trying to use it to renumber the entries of a pdb file (sample below)

ATOM 9 H5T THY C 1 107.274 35.359 -9.821 0.00 0.00 N1 H

ATOM 10 O5' THY C 1 107.686 36.230 -9.553 1.00 0.00 N1 O

ATOM 11 C5' THY C 1 108.813 35.973 -8.710 1.00 0.00 N1 C

ATOM 12 H5' THY C 1 109.513 35.493 -9.239 0.00 0.00 N1 H

ATOM 13 H5'' THY C 1 108.495 35.550 -7.861 0.00 0.00 N1 H

ATOM 14 N1 THY C 1 107.956 38.157 -5.232 1.00 0.00 N1 N

ATOM 15 C6 THY C 1 107.862 39.006 -4.149 1.00 0.00 N1 C

ATOM 16 H6 THY C 1 108.479 39.755 -3.910 0.00 0.00 N1 H

ATOM 17 C2 THY C 1 107.024 37.166 -5.449 1.00 0.00 N1 C

ATOM 18 O2 THY C 1 107.071 36.398 -6.397 1.00 0.00 N1 O

ATOM 19 N3 THY C 1 106.027 37.104 -4.506 1.00 0.00 N1 N

So I want to renumber the entries in the second column (next to ATOM), starting them all sequentially from 1. This is what I've written so far:
#/bin/perl/ use strict; use warnings; my $pdb; my $line; my @columns; open $pdb, '<', "structure.pdb" or die $!; #match string and select column my $string = "ATOM"; # match for ATOM my @nth_column; my $n = 1; # select 2nd column while ($line = <$pdb>) { if ($line =~ $string) { chomp $line; @columns = split /\s+/, $line; push @nth_column, $columns[$n]; #for testing: print "$columns[$n] \n"; } }

So I am printing the correct column but I'm having a bit of a brain fart because I can't figure out how to actually do the renumbering? Would I match for a digit using regex and then introduce a counter? Does that sound reasonable?

Thanks in advance!!

Replies are listed 'Best First'.
Re: renumber entries in column?
by BillKSmith (Monsignor) on Jun 04, 2018 at 14:30 UTC
    Perl's Command Switches (perlrun) Can be a big help.
    C:\Users\Bill\forums\monks>type aria.dat ATOM 9 H5T THY C 1 107.274 35.359 -9.821 0.00 0.00 N1 H ATOM 10 O5' THY C 1 107.686 36.230 -9.553 1.00 0.00 N1 O ATOM 11 C5' THY C 1 108.813 35.973 -8.710 1.00 0.00 N1 C ATOM 12 H5' THY C 1 109.513 35.493 -9.239 0.00 0.00 N1 H ATOM 13 H5'' THY C 1 108.495 35.550 -7.861 0.00 0.00 N1 H ATOM 14 N1 THY C 1 107.956 38.157 -5.232 1.00 0.00 N1 N ATOM 15 C6 THY C 1 107.862 39.006 -4.149 1.00 0.00 N1 C ATOM 16 H6 THY C 1 108.479 39.755 -3.910 0.00 0.00 N1 H ATOM 17 C2 THY C 1 107.024 37.166 -5.449 1.00 0.00 N1 C ATOM 18 O2 THY C 1 107.071 36.398 -6.397 1.00 0.00 N1 O ATOM 19 N3 THY C 1 106.027 37.104 -4.506 1.00 0.00 N1 N C:\Users\Bill\forums\monks>type aria.pl #!perl -pibak use feature 'state'; state $num = 0; $num++; s/^ATOM\s\d+/ATOM $num/; C:\Users\Bill\forums\monks>perl aria.pl aria.dat C:\Users\Bill\forums\monks>type aria.dat ATOM 1 H5T THY C 1 107.274 35.359 -9.821 0.00 0.00 N1 H ATOM 2 O5' THY C 1 107.686 36.230 -9.553 1.00 0.00 N1 O ATOM 3 C5' THY C 1 108.813 35.973 -8.710 1.00 0.00 N1 C ATOM 4 H5' THY C 1 109.513 35.493 -9.239 0.00 0.00 N1 H ATOM 5 H5'' THY C 1 108.495 35.550 -7.861 0.00 0.00 N1 H ATOM 6 N1 THY C 1 107.956 38.157 -5.232 1.00 0.00 N1 N ATOM 7 C6 THY C 1 107.862 39.006 -4.149 1.00 0.00 N1 C ATOM 8 H6 THY C 1 108.479 39.755 -3.910 0.00 0.00 N1 H ATOM 9 C2 THY C 1 107.024 37.166 -5.449 1.00 0.00 N1 C ATOM 10 O2 THY C 1 107.071 36.398 -6.397 1.00 0.00 N1 O ATOM 11 N3 THY C 1 106.027 37.104 -4.506 1.00 0.00 N1 N C:\Users\Bill\forums\monks>type aria.datbak ATOM 9 H5T THY C 1 107.274 35.359 -9.821 0.00 0.00 N1 H ATOM 10 O5' THY C 1 107.686 36.230 -9.553 1.00 0.00 N1 O ATOM 11 C5' THY C 1 108.813 35.973 -8.710 1.00 0.00 N1 C ATOM 12 H5' THY C 1 109.513 35.493 -9.239 0.00 0.00 N1 H ATOM 13 H5'' THY C 1 108.495 35.550 -7.861 0.00 0.00 N1 H ATOM 14 N1 THY C 1 107.956 38.157 -5.232 1.00 0.00 N1 N ATOM 15 C6 THY C 1 107.862 39.006 -4.149 1.00 0.00 N1 C ATOM 16 H6 THY C 1 108.479 39.755 -3.910 0.00 0.00 N1 H ATOM 17 C2 THY C 1 107.024 37.166 -5.449 1.00 0.00 N1 C ATOM 18 O2 THY C 1 107.071 36.398 -6.397 1.00 0.00 N1 O ATOM 19 N3 THY C 1 106.027 37.104 -4.506 1.00 0.00 N1 N
    Bill
Re: renumber entries in column?
by trippledubs (Deacon) on Jun 04, 2018 at 13:54 UTC

    If they are already sequential, subtract 8.

    print $columns[$n]-8 ,"\n";

    Or adding a counter is reasonable. Since you don't change $n, you could repurpose that as your counter and change the data you started with.

    #for testing: $columns[1] = $n++; print "$columns[1] \n"; # Print the whole array with new numbering print "@columns\n"

    or without changing the original data, use the postincrement operator

    #for testing print $n++,"\n";
Re: renumber entries in column?
by trippledubs (Deacon) on Jun 04, 2018 at 15:13 UTC

    BillKSmith reminded me, you can also use autosplit.

    perl -lae 'F[1]-=8; next if (/^\s*$/); print "@F"' structure.pdb
Re: renumber entries in column?
by Marshall (Canon) on Jun 04, 2018 at 22:12 UTC
    Here is some example code. Your file has blank lines, i.e. things other than ATOM lines.
    That is fine.
    What we want is a regex that only modifies ATOM lines with a new column #2 (index 1)
    In Perl it is possible for the second part of a substitution regex to be executable code.
    I think this does what you want?
    Note that /$1.$num++.$3, the "." is not regex "dot" it is concatenation "operator".
    Perl regex is extremely powerful!
    #/usr/bin/perl/ use strict; use warnings; my $num=1; while (my $line = <DATA>) { $line =~ s/^(ATOM\s+)(\d+)(.*)/$1.$num++.$3/e; #e flag means exec +ute print $line; } =Prints ATOM 1 H5T THY C 1 107.274 35.359 -9.821 0.00 0.00 N1 H ATOM 2 O5' THY C 1 107.686 36.230 -9.553 1.00 0.00 N1 O ATOM 3 C5' THY C 1 108.813 35.973 -8.710 1.00 0.00 N1 C ATOM 4 H5' THY C 1 109.513 35.493 -9.239 0.00 0.00 N1 H ATOM 5 H5'' THY C 1 108.495 35.550 -7.861 0.00 0.00 N1 H ATOM 6 N1 THY C 1 107.956 38.157 -5.232 1.00 0.00 N1 N ATOM 7 C6 THY C 1 107.862 39.006 -4.149 1.00 0.00 N1 C ATOM 8 H6 THY C 1 108.479 39.755 -3.910 0.00 0.00 N1 H ATOM 9 C2 THY C 1 107.024 37.166 -5.449 1.00 0.00 N1 C ATOM 10 O2 THY C 1 107.071 36.398 -6.397 1.00 0.00 N1 O ATOM 11 N3 THY C 1 106.027 37.104 -4.506 1.00 0.00 N1 N =cut __DATA__ ATOM 9 H5T THY C 1 107.274 35.359 -9.821 0.00 0.00 N1 H ATOM 10 O5' THY C 1 107.686 36.230 -9.553 1.00 0.00 N1 O ATOM 11 C5' THY C 1 108.813 35.973 -8.710 1.00 0.00 N1 C ATOM 12 H5' THY C 1 109.513 35.493 -9.239 0.00 0.00 N1 H ATOM 13 H5'' THY C 1 108.495 35.550 -7.861 0.00 0.00 N1 H ATOM 14 N1 THY C 1 107.956 38.157 -5.232 1.00 0.00 N1 N ATOM 15 C6 THY C 1 107.862 39.006 -4.149 1.00 0.00 N1 C ATOM 16 H6 THY C 1 108.479 39.755 -3.910 0.00 0.00 N1 H ATOM 17 C2 THY C 1 107.024 37.166 -5.449 1.00 0.00 N1 C ATOM 18 O2 THY C 1 107.071 36.398 -6.397 1.00 0.00 N1 O ATOM 19 N3 THY C 1 106.027 37.104 -4.506 1.00 0.00 N1 N