I love it. "Taking the last line..." (which is line 570 of the sample input): 66 occurs in the 6th column of the first row and the 293rd row, 16.840 occurs in the 9th column of line 293 from the sample input, and 'B' occurs on the last row and the 293rd, but not on the first. But a little searching through the sample input resolved any ambiguity.

Anyway, this is fixed-width stuff, so you should probably be thinking in terms of unpack or substr rather than regular expressions:

use strict; use warnings; while ( <DATA> ) { next unless /^ATOM\b/; my $chain = substr $_, 21, 1; my $position = 0 + substr $_, 23, 3; my $Zcoordinate = 0 + substr $_, 47, 7; print "$chain, $position, $Zcoordinate\n"; } __DATA__ ATOM 30 N HIS A 66 7.514 15.296 11.222 1.00 12.98 + A N ATOM 31 CA HIS A 66 7.318 14.688 12.568 1.00 12.48 + A C ATOM 32 C HIS A 66 8.676 14.309 13.156 1.00 11.62 + A C ATOM 33 O HIS A 66 9.708 14.518 12.545 1.00 11.76 + A O

Update:

Using unpack is a more computationally efficient alternative, though from a programmer standpoint it always takes me longer to work out the template, which is why I posted the substr solution first. Now that I've had time to work out the template for the unpack solution, here it is:

while ( <DATA> ) { next unless /^ATOM\b/; my( $chain, $position, $Zcoordinate ) = unpack( 'x21a1xA3x21A7',$_); print "$chain, $position, $Zcoordinate\n"; } __DATA__ ATOM 30 N HIS A 66 7.514 15.296 11.222 1.00 12.98 + A N ATOM 31 CA HIS A 66 7.318 14.688 12.568 1.00 12.48 + A C ATOM 32 C HIS A 66 8.676 14.309 13.156 1.00 11.62 + A C ATOM 33 O HIS A 66 9.708 14.518 12.545 1.00 11.76 + A O ATOM 34 CB HIS A 66 6.450 13.434 12.442 1.00 12.81 + A C ATOM 35 CG HIS A 66 5.000 13.829 12.378 1.00 13.36 + A C ATOM 36 ND1 HIS A 66 4.332 14.002 11.175 1.00 13.57 + A N ATOM 37 CD2 HIS A 66 4.073 14.085 13.360 1.00 13.93 + A C ATOM 38 CE1 HIS A 66 3.063 14.347 11.461 1.00 14.23 + A C ATOM 39 NE2 HIS A 66 2.851 14.410 12.778 1.00 14.47 + A N

Whether you use substr, or unpack, you'll then be able to feed the input into a data structure as described by Choroba.


Dave


In reply to Re: How to select specific lines from a file by davido
in thread How to select specific lines from a file by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.