comment on

Providing the data makes a *big* difference and now it is much clearer why your script is having problems:

you are tying yourself in knots trying to treat each line as a separate input record. This makes you think that a single record split between two lines is two records: a "header" record followed by a "data" record. In actual fact newlines do not define records: ">" does. The newline is just whitespace that can be removed from the record. Nor is there a header record - the data you are calling a header is just one of many fields.
You are ignoring the labels for data provided in each field rather than using them to their fullest.
You never check for undefined values. Rather you seem to assume that fields will always have defined values, when in fact, they sometimes don't. Then when you try to split an undefined sequence or include some other undefined field in a print statement, you get (surprise, surprise) -- warnings!

There are a few other issues with regular expressions, variable name selection and your methods for formatting output, that deserve some attention. I've posted a revised version of your code with annotations - including how to choose an alternate record separator - see the documentation for $/ in perldoc:://perlvar for more information. Hopefully, you can use this to avoid parsing problems in future code.

use strict;
use warnings;

my %s2m = (A =>  71.0371, C => 103.0092, D => 115.0269, E => 129.0426,
        F => 147.0684, G =>  57.0215, H => 137.0589, I => 113.0841,
        K => 128.0950, L => 113.0841, M => 131.0405, N => 114.0429,
        P =>  97.0528, Q => 128.0586, R => 156.1011, S =>  87.0320,
        T => 101.0477, V =>  99.0684, W => 186.0793, Y => 163.0633,
        '\s' => 0.0, "*" => 0.0
       );

#when you need to keep labels and columns lined up, using
#printf with a format string is usually a better idea than
#one long string - for the header %s is the labels and for
#the rows %s are the values.  Also you can use the format
#string to set field widths - tabs aren't very reliable as
#a way to line up columns because different users can have
#different tab settings:

my $OUTPUT_FMT= "%-35s%-8s%-15s%-3s%-3s%s\n";
printf $OUTPUT_FMT, 'Prot_name', 'peptide', 'mass-to-charge'
  , 'z', 'p', 'sequence';

#since we don't want complaints about undefined values,
#we need to check for undefined values and provide a
#string representation for the undefined value - see below
#for how $MISSING is used

my $MISSING='-';

#There is no law saying you have to split you input on a newline.
#Since > marks the start of a new record, why not use that to
#divide records.  You can use $/ to choose a record separator

$/='>';

#For complicated parsing it is usually best to use a named variable
#to hold the record or line rather than rely on $_.

while ( my $line = <DATA> ) {

  #when you use a named variable, you need to pass it to chomp
  #like this:
  chomp $line;

  #this is a more efficient way to skip empty lines than /^\s*$/
  #just look for at least one non-space character
  next unless $line =~ /\S/;  #skip unless at least one non-space

  #unless white space around field and subfield separators is
  #an actual value, you should make whitespace part of
  #the separator - so /\s*\|\s*/ instead of /|/
  my @aFields = split /\s*\|\s*/, $line; #pipe is field separator

  #stripping whitespace from around separators will still leave
  #white space at start of first field and end of last field,
  #so strip it as a separate step
  if (scalar(@aFields)) {
    $aFields[0] =~ s/^\s*(\S.*)$/$1/;
    $aFields[-1] =~ s/^(.*\S)\s*$/$1/;
  }

  #each field consists of label:value - taking advantage of this
  #we can just put field values into a hash where the label is
  #the key
  my %hFields;
  foreach my $sField (@aFields) {
    my ($sLabel, $sValue) = split(/\s*:\s*/, $sField);
    $hFields{$sLabel} = $sValue;
  }

  #If a field is missing from a record, then attempts to pass
  #it to split, length, or printf will result in complaints
  #about undefined values, so before using anything either
  #check to see if it is defined and/or assign it a default
  #value

  #Note also, we check for "unless defined($hFields{Missed})"
  #and not just "unless $hFields{Missed}" 
  #unless $var would also set $var=$MISSING if $var==0.

  $hFields{Protein}=$MISSING unless defined($hFields{Protein});
  $hFields{Peptide}=$MISSING unless defined($hFields{Peptide});
  $hFields{Missed}=$MISSING unless defined($hFields{Missed});

  #To split a string into characters, use // not / /.
  #Also if you are calculating mass, why not use mass as a
  #variable name: it is much more self documenting

  #my $total = 0.0;
  #my @peptides = split (/ /, $sequence);#65

  my $mass=0.0;
  if (defined($hFields{Seq})) {
    my @peptides = split(//, $hFields{Seq});
    foreach my $peptide (@peptides) {
      $mass += ($s2m{$peptide}+18.0106) if defined $s2m{$peptide};
    }
  } else {
    $hFields{Seq} = $MISSING;
  }

  printf $OUTPUT_FMT, $hFields{Protein}, $hFields{Peptide}
    , $mass, '1', $hFields{Missed}, $hFields{Seq};
}
[download]

Best, beth

In reply to Re: perl task3 by ELISHEVA
in thread perl task3 by huzefa

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.