comment on

I love it. "Taking the last line..." (which is line 570 of the sample input): 66 occurs in the 6th column of the first row and the 293rd row, 16.840 occurs in the 9th column of line 293 from the sample input, and 'B' occurs on the last row and the 293rd, but not on the first. But a little searching through the sample input resolved any ambiguity.

Anyway, this is fixed-width stuff, so you should probably be thinking in terms of unpack or substr rather than regular expressions:

use strict;
use warnings;

while ( <DATA> ) {
  next unless /^ATOM\b/;
  my $chain = substr $_, 21, 1;
  my $position = 0 + substr $_, 23, 3;
  my $Zcoordinate = 0 + substr $_, 47, 7;
  print "$chain, $position, $Zcoordinate\n";
}

__DATA__
ATOM     30  N   HIS A  66       7.514  15.296  11.222  1.00 12.98    
+  A    N  
ATOM     31  CA  HIS A  66       7.318  14.688  12.568  1.00 12.48    
+  A    C  
ATOM     32  C   HIS A  66       8.676  14.309  13.156  1.00 11.62    
+  A    C  
ATOM     33  O   HIS A  66       9.708  14.518  12.545  1.00 11.76    
+  A    O
[download]

Update:

Using unpack is a more computationally efficient alternative, though from a programmer standpoint it always takes me longer to work out the template, which is why I posted the substr solution first. Now that I've had time to work out the template for the unpack solution, here it is:

while ( <DATA> ) {
  next unless /^ATOM\b/;
  my( $chain, $position, $Zcoordinate ) = unpack( 'x21a1xA3x21A7',$_);
  print "$chain, $position, $Zcoordinate\n";
}

__DATA__
ATOM     30  N   HIS A  66       7.514  15.296  11.222  1.00 12.98    
+  A    N  
ATOM     31  CA  HIS A  66       7.318  14.688  12.568  1.00 12.48    
+  A    C  
ATOM     32  C   HIS A  66       8.676  14.309  13.156  1.00 11.62    
+  A    C  
ATOM     33  O   HIS A  66       9.708  14.518  12.545  1.00 11.76    
+  A    O  
ATOM     34  CB  HIS A  66       6.450  13.434  12.442  1.00 12.81    
+  A    C  
ATOM     35  CG  HIS A  66       5.000  13.829  12.378  1.00 13.36    
+  A    C  
ATOM     36  ND1 HIS A  66       4.332  14.002  11.175  1.00 13.57    
+  A    N  
ATOM     37  CD2 HIS A  66       4.073  14.085  13.360  1.00 13.93    
+  A    C  
ATOM     38  CE1 HIS A  66       3.063  14.347  11.461  1.00 14.23    
+  A    C  
ATOM     39  NE2 HIS A  66       2.851  14.410  12.778  1.00 14.47    
+  A    N
[download]

Whether you use substr, or unpack, you'll then be able to feed the input into a data structure as described by Choroba.

Dave

In reply to Re: How to select specific lines from a file by davido
in thread How to select specific lines from a file by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.