I see some trouble with your code to parse out the sequences from SEQRES lines. There is some redundant information in the .pdb format. I used the sequential letter (A,B,C,D,E,etc) to delineate start/end of the various sequences. Number of 3 letter groups in the sequence would be another way. I am not sure what you are trying to do. At the end of code, see a couple of example FASTA segments I cut n' pasted from the program output.

I have no ide what to do with the x,y,z atom coordinates or even what they mean. I averaged them for fun. I changed your regex a bit to make it easier to pick out the +-floating point numbers.

To run code, download the .pdb file and name it "6U9D.pdb". In production code, I probably would not use so much memory for temporary results, preferring to calculate and output results as they occur rather than save a bunch of data and process it at the end. However for these prototype things, having the data in intermediate forms can useful for debugging. Have fun.

use strict; use warnings; open my $pdb_fh, '<', "6U9D.pdb" or die "unable to open 6U9D.pdb for r +eading $!"; my %amino_acid_conversion = ( ALA => 'A', ARG => 'R', ASN => 'N', ASP => 'D', CYS => 'C', GLN => 'Q', GLU => 'E', GLY => 'G', HIS => 'H', ILE => 'I', LEU => 'L', LYS => 'K', MET => 'M', PHE => 'F', PRO => 'P', SER => 'S', THR => 'T', TRP => 'W', TYR => 'Y', VAL => 'V'); my @atoms; #2D array of x,y,z of all atoms (many thousands) my @seqs; #raw sequences with the 3 letter designations #convert later to FASTA format my $cur_seq = ''; my $cur_ltr = ''; # parse out data of interest from file # while (<$pdb_fh>) { if (my ($ltr, $seq) = (/^SEQRES\s+\d+\s+(\w+)\s+\d+\s+([A-Z ]+)$/) +) { $seq =~ s/\s+$//; if ($ltr eq $cur_ltr) { $cur_seq .= " $seq"; } else { push @seqs, $cur_seq if $cur_seq ne ''; # end of current seq +uence $cur_seq = $seq; # begin of the next +sequence $cur_ltr = $ltr; } } elsif (my ($x,$y,$z) = (/^ATOM\s+.*?([\d.-]+)\s+([\d.-]+)\s+([\d.-] ++)/) ) { push @atoms, [$x,$y,$z]; } } push @seqs, $cur_seq; # don't forget to finish the last seq! #### output collected data ### #make a fasta sequence segments foreach my $seq (@seqs) { # my $fasta = join '',map{$amino_acid_conversion{$_}}split ' ',$seq +; # without using a map: # my $fasta =''; foreach my $char3 (split ' ',$seq) { $fasta.= $amino_acid_conversion{$char3} } print ">Some Fasta Description Line\n"; #use 60 char lines while ($fasta) #fasta suggested max is 80 { print substr($fasta,0,60,''),"\n"; } } #print the data points # I am not sure what needs to be done with them # average of each coordinate? #foreach my $row_ref (@atoms) #uncomment to print #{ # print @$row_ref,"\n"; #} my $xsum; my $ysum; my $zsum; foreach my $row_ref (@atoms) # @atoms is a 2D array { my ($x, $y , $z ) = @$row_ref; $xsum+=$x; $ysum+=$y; $zsum+=$z; } print "avg x = ",$xsum/@atoms,"\n"; print "avg y = ",$ysum/@atoms,"\n"; print "avg z = ",$zsum/@atoms,"\n"; __END__ These are 2 examples: You will have to figure out what goes in the FASTA description line And perhaps not all of these sequences are relevant? Looks like a lot are duplicates. >Some Fasta Description Line MHHHHHHENLYFQGAPSFNVDPLEQPAEPSKLAKKLRAEPDMDTSFVGLTGGQIFNEMMS RQNVDTVFGYPGGAILPVYDAIHNSDKFNFVLPKHEQGAGHMAEGYARASGKPGVVLVTS GPGATNVVTPMADAFADGIPMVVFTGQVPTSAIGTDAFQEADVVGISRSCTKWNVMVKSV EELPLRINEAFEIATSGRPGPVLVDLPKDVTAAILRNPIPTKTTLPSNALNQLTSRAQDE FVMQSINKAADLINLAKKPVLYVGAGILNHADGPRLLKELSDRAQIPVTTTLQGLGSFDQ EDPKSLDMLGMHGCATANLAVQNADLIIAVGARFDDRVTGNISKFAPEARRAAAEGRGGI IHFEVSPKNINKVVQTQIAVEGDATTNLGKMMSKIFPVKERSEWFAQINKWKKEYPYAYM EETPGSKIKPQTVIKKLSKVANDTGRHVIVTTGVGQHQMWAAQHWTWRNPHTFITSGGLG TMGYGLPAAIGAQVAKPESLVIDIDGDASFNMTLTELSSAVQAGTPVKILILNNEEQGMV TQWQSLFYEHRYSHTHQLNPDFIKLAEAMGLKGLRVKKQEELDAKLKEFVSTKGPVLLEV EVDKKVPVLPMVAGGSGLDEFINFDPEVERQQTELRHKRTGGKH >Some Fasta Description Line MGSSHHHHHHSSGLVPRGSHMENLYFQGATRPPLPTLDTPSWNANSAVSSIIYETPAPSR QPRKQHVLNCLVQNEPGVLSRVSGTLAARGFNIDSLVVCNTEVKDLSRMTIVLQGQDGVI EQARRQIEDLVPVYAVLDYTNSEIIKRELVMARISLLGTEYFEDLLLHHHTSTNAGAADS QELVAEIREKQFHPANLPASEVLRLKHEHLNDITNLTNNFGGRVVDISETSCIVELSAKP TRISAFLKLVEPFGVLECARSGMMALPRTPLKTSTEEAADEDEKISEIVDISQLPPG I have no idea what these numbers would mean? avg x = 321.013155298296 avg y = 290.744642162734 avg z = 69.196842162731

In reply to Re: Issues regarding for loops and recursion by Marshall
in thread Issues regarding for loops and recursion by Nickmofoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.