in reply to Help for a regex problem ?

It is "Perl" (the language) or "perl" (the program that implements Perl), but it is never, ever "PERL". Now go and write a thousand times "I will never write 'PERL'" (you may use Perl to write it).

If you want us to really help you, it would assist if you could give a few examples of your input file: some lines which have to match and some lines which should not match.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^2: Help for a regex problem ?
by hellworld (Novice) on Jul 13, 2009 at 20:24 UTC
    I will start writing that now. Um, the input is like this: ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H -The 18.80 and N parts are supposed to follow 1.00 directly, not from a space below but the comment spaces, doh.- And I'm supposed to print out ATOM lines that don't have "H" near the 17 part. It keeps printing out all lines despite the syntax. HETATM is working properly and a sample of it -from another input pdb file since this 1GRL.pdb doesn't have HETATM's in it, this is from the input file 1FFL.pdb- HETATM 1 N CXM A 1 -12.588 -1.070 15.591 1.00 25.28 N HETATM 2 CA CXM A 1 -11.877 -0.094 16.395 1.00 25.28 C and I'm supposed to print HETATMS that don't have HOH on the part where CXM stands. HETATM 2153 O HOH A 300 -38.403 0.000 33.125 0.50 13.41 O HETATM 2154 O HOH A 301 -29.459 12.090 33.186 1.00 31.37 O these ones are successfully ignored by the program.
      Am I right in assuming that these lines are produced by some other program or machine? So there is a certain format it adheres to? We have to analyse the format.

      Guessing from what you gave here, it *seems* as if every record starts with ATOM and goes like this:

      ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H
      I have added some spaces to align the fields.

      If my guess is correct, you need to find lines which do not have a "H" in the third field.

      use strict; while (<DATA>) { print if (split ' ')[2] ne 'H'; } __DATA__ ATOM 16 NZ LYS A 7 -19.664 15.558 -9.499 1.00 18.80 N ATOM 17 H LYS A 7 -19.967 21.014 -14.224 1.00 0.00 H
      Update: I leave it up to you to apply this to the "HETATM" file.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Thanks a lot, it hekped greatly. Is there a way for me to use the .pdb files as the <DATA> ? Because some files have about 29000 ATOM lines.