in reply to Re^2: Help for a regex problem ?
in thread Help for a regex problem ?

Am I right in assuming that these lines are produced by some other program or machine? So there is a certain format it adheres to? We have to analyse the format.

Guessing from what you gave here, it *seems* as if every record starts with ATOM and goes like this:

ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H
I have added some spaces to align the fields.

If my guess is correct, you need to find lines which do not have a "H" in the third field.

use strict; while (<DATA>) { print if (split ' ')[2] ne 'H'; } __DATA__ ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H
Update: I leave it up to you to apply this to the "HETATM" file.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^4: Help for a regex problem ?
by hellworld (Novice) on Jul 13, 2009 at 20:52 UTC
    Thanks a lot, it hekped greatly. Is there a way for me to use the .pdb files as the <DATA> ? Because some files have about 29000 ATOM lines.
      Yes of course. Open a filehandle to that file and use the filehandle instead of <DATA>:
      open my $fh, '<', 'path/to/my/PDB/file' or die "Could not open PDB fil +e: $!"; while (<$fh>) { ... }

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James