in reply to Re: Help for a regex problem ?
in thread Help for a regex problem ?

I will start writing that now. Um, the input is like this: ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H -The 18.80 and N parts are supposed to follow 1.00 directly, not from a space below but the comment spaces, doh.- And I'm supposed to print out ATOM lines that don't have "H" near the 17 part. It keeps printing out all lines despite the syntax. HETATM is working properly and a sample of it -from another input pdb file since this 1GRL.pdb doesn't have HETATM's in it, this is from the input file 1FFL.pdb- HETATM 1 N CXM A 1 -12.588 -1.070 15.591 1.00 25.28 N HETATM 2 CA CXM A 1 -11.877 -0.094 16.395 1.00 25.28 C and I'm supposed to print HETATMS that don't have HOH on the part where CXM stands. HETATM 2153 O HOH A 300 -38.403 0.000 33.125 0.50 13.41 O HETATM 2154 O HOH A 301 -29.459 12.090 33.186 1.00 31.37 O these ones are successfully ignored by the program.

Replies are listed 'Best First'.
Re^3: Help for a regex problem ?
by CountZero (Bishop) on Jul 13, 2009 at 20:39 UTC
    Am I right in assuming that these lines are produced by some other program or machine? So there is a certain format it adheres to? We have to analyse the format.

    Guessing from what you gave here, it *seems* as if every record starts with ATOM and goes like this:

    ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H
    I have added some spaces to align the fields.

    If my guess is correct, you need to find lines which do not have a "H" in the third field.

    use strict; while (<DATA>) { print if (split ' ')[2] ne 'H'; } __DATA__ ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H
    Update: I leave it up to you to apply this to the "HETATM" file.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Thanks a lot, it hekped greatly. Is there a way for me to use the .pdb files as the <DATA> ? Because some files have about 29000 ATOM lines.
        Yes of course. Open a filehandle to that file and use the filehandle instead of <DATA>:
        open my $fh, '<', 'path/to/my/PDB/file' or die "Could not open PDB fil +e: $!"; while (<$fh>) { ... }

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James