comment on

I see some trouble with your code to parse out the sequences from SEQRES lines. There is some redundant information in the .pdb format. I used the sequential letter (A,B,C,D,E,etc) to delineate start/end of the various sequences. Number of 3 letter groups in the sequence would be another way. I am not sure what you are trying to do. At the end of code, see a couple of example FASTA segments I cut n' pasted from the program output.

I have no ide what to do with the x,y,z atom coordinates or even what they mean. I averaged them for fun. I changed your regex a bit to make it easier to pick out the +-floating point numbers.

To run code, download the .pdb file and name it "6U9D.pdb". In production code, I probably would not use so much memory for temporary results, preferring to calculate and output results as they occur rather than save a bunch of data and process it at the end. However for these prototype things, having the data in intermediate forms can useful for debugging. Have fun.

use strict;
use warnings;

open my $pdb_fh, '<', "6U9D.pdb" or die "unable to open 6U9D.pdb for r
+eading $!";

my %amino_acid_conversion = (
    ALA => 'A',
    ARG => 'R',
    ASN => 'N',
    ASP => 'D',
    CYS => 'C',
    GLN => 'Q',
    GLU => 'E',
    GLY => 'G',
    HIS => 'H',
    ILE => 'I',
    LEU => 'L',
    LYS => 'K',
    MET => 'M',
    PHE => 'F',
    PRO => 'P',
    SER => 'S',
    THR => 'T',
    TRP => 'W',
    TYR => 'Y',
    VAL => 'V');
    
my @atoms;  #2D array of x,y,z of all atoms (many thousands)
my @seqs;   #raw sequences with the 3 letter designations
            #convert later to FASTA format

my $cur_seq = '';
my $cur_ltr = '';

# parse out data of interest from file
#
while (<$pdb_fh>)
{   
   if (my ($ltr, $seq) = (/^SEQRES\s+\d+\s+(\w+)\s+\d+\s+([A-Z ]+)$/) 
+)
   {
      $seq =~ s/\s+$//;
      
      if ($ltr eq $cur_ltr)
      {
          $cur_seq .= " $seq";
      }
      else
      {
          push @seqs, $cur_seq if $cur_seq ne ''; # end of current seq
+uence
          $cur_seq = $seq;                        # begin of the next 
+sequence
          $cur_ltr = $ltr;
      }
   }   
   elsif (my ($x,$y,$z) = (/^ATOM\s+.*?([\d.-]+)\s+([\d.-]+)\s+([\d.-]
++)/) )
   {
      push @atoms, [$x,$y,$z];
   }
}
push @seqs, $cur_seq;  # don't forget to finish the last seq!

#### output collected data ###

#make a fasta sequence segments
foreach my $seq (@seqs)
{  
   # my $fasta = join '',map{$amino_acid_conversion{$_}}split ' ',$seq
+;
  
   # without using a map:
   #
   my $fasta ='';
   foreach my $char3 (split ' ',$seq)
   {
      $fasta.= $amino_acid_conversion{$char3}
   }
   
   print ">Some Fasta Description Line\n";  #use 60 char lines
   while ($fasta)                           #fasta suggested max is 80
   {
      print substr($fasta,0,60,''),"\n";
   }
}

#print the data points
# I am not sure what needs to be done with them
# average of each coordinate?

#foreach my $row_ref (@atoms)  #uncomment to print
#{
#   print @$row_ref,"\n";
#}

my $xsum; my $ysum; my $zsum;
foreach my $row_ref (@atoms)  # @atoms is a 2D array
{
   my ($x, $y , $z ) = @$row_ref;
   $xsum+=$x;
   $ysum+=$y;
   $zsum+=$z;
}

print "avg x = ",$xsum/@atoms,"\n";
print "avg y = ",$ysum/@atoms,"\n";
print "avg z = ",$zsum/@atoms,"\n";

__END__

These are 2 examples:
You will have to figure out what goes in the FASTA description line
And perhaps not all of these sequences are relevant?  Looks like
a lot are duplicates.

>Some Fasta Description Line
MHHHHHHENLYFQGAPSFNVDPLEQPAEPSKLAKKLRAEPDMDTSFVGLTGGQIFNEMMS
RQNVDTVFGYPGGAILPVYDAIHNSDKFNFVLPKHEQGAGHMAEGYARASGKPGVVLVTS
GPGATNVVTPMADAFADGIPMVVFTGQVPTSAIGTDAFQEADVVGISRSCTKWNVMVKSV
EELPLRINEAFEIATSGRPGPVLVDLPKDVTAAILRNPIPTKTTLPSNALNQLTSRAQDE
FVMQSINKAADLINLAKKPVLYVGAGILNHADGPRLLKELSDRAQIPVTTTLQGLGSFDQ
EDPKSLDMLGMHGCATANLAVQNADLIIAVGARFDDRVTGNISKFAPEARRAAAEGRGGI
IHFEVSPKNINKVVQTQIAVEGDATTNLGKMMSKIFPVKERSEWFAQINKWKKEYPYAYM
EETPGSKIKPQTVIKKLSKVANDTGRHVIVTTGVGQHQMWAAQHWTWRNPHTFITSGGLG
TMGYGLPAAIGAQVAKPESLVIDIDGDASFNMTLTELSSAVQAGTPVKILILNNEEQGMV
TQWQSLFYEHRYSHTHQLNPDFIKLAEAMGLKGLRVKKQEELDAKLKEFVSTKGPVLLEV
EVDKKVPVLPMVAGGSGLDEFINFDPEVERQQTELRHKRTGGKH
>Some Fasta Description Line
MGSSHHHHHHSSGLVPRGSHMENLYFQGATRPPLPTLDTPSWNANSAVSSIIYETPAPSR
QPRKQHVLNCLVQNEPGVLSRVSGTLAARGFNIDSLVVCNTEVKDLSRMTIVLQGQDGVI
EQARRQIEDLVPVYAVLDYTNSEIIKRELVMARISLLGTEYFEDLLLHHHTSTNAGAADS
QELVAEIREKQFHPANLPASEVLRLKHEHLNDITNLTNNFGGRVVDISETSCIVELSAKP
TRISAFLKLVEPFGVLECARSGMMALPRTPLKTSTEEAADEDEKISEIVDISQLPPG

I have no idea what these numbers would mean?

avg x = 321.013155298296
avg y = 290.744642162734
avg z = 69.196842162731
[download]

In reply to Re: Issues regarding for loops and recursion by Marshall
in thread Issues regarding for loops and recursion by Nickmofoe

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.