There are a few other issues with regular expressions, variable name selection and your methods for formatting output, that deserve some attention. I've posted a revised version of your code with annotations - including how to choose an alternate record separator - see the documentation for $/ in perldoc:://perlvar for more information. Hopefully, you can use this to avoid parsing problems in future code.
use strict; use warnings; my %s2m = (A => 71.0371, C => 103.0092, D => 115.0269, E => 129.0426, F => 147.0684, G => 57.0215, H => 137.0589, I => 113.0841, K => 128.0950, L => 113.0841, M => 131.0405, N => 114.0429, P => 97.0528, Q => 128.0586, R => 156.1011, S => 87.0320, T => 101.0477, V => 99.0684, W => 186.0793, Y => 163.0633, '\s' => 0.0, "*" => 0.0 ); #when you need to keep labels and columns lined up, using #printf with a format string is usually a better idea than #one long string - for the header %s is the labels and for #the rows %s are the values. Also you can use the format #string to set field widths - tabs aren't very reliable as #a way to line up columns because different users can have #different tab settings: my $OUTPUT_FMT= "%-35s%-8s%-15s%-3s%-3s%s\n"; printf $OUTPUT_FMT, 'Prot_name', 'peptide', 'mass-to-charge' , 'z', 'p', 'sequence'; #since we don't want complaints about undefined values, #we need to check for undefined values and provide a #string representation for the undefined value - see below #for how $MISSING is used my $MISSING='-'; #There is no law saying you have to split you input on a newline. #Since > marks the start of a new record, why not use that to #divide records. You can use $/ to choose a record separator $/='>'; #For complicated parsing it is usually best to use a named variable #to hold the record or line rather than rely on $_. while ( my $line = <DATA> ) { #when you use a named variable, you need to pass it to chomp #like this: chomp $line; #this is a more efficient way to skip empty lines than /^\s*$/ #just look for at least one non-space character next unless $line =~ /\S/; #skip unless at least one non-space #unless white space around field and subfield separators is #an actual value, you should make whitespace part of #the separator - so /\s*\|\s*/ instead of /|/ my @aFields = split /\s*\|\s*/, $line; #pipe is field separator #stripping whitespace from around separators will still leave #white space at start of first field and end of last field, #so strip it as a separate step if (scalar(@aFields)) { $aFields[0] =~ s/^\s*(\S.*)$/$1/; $aFields[-1] =~ s/^(.*\S)\s*$/$1/; } #each field consists of label:value - taking advantage of this #we can just put field values into a hash where the label is #the key my %hFields; foreach my $sField (@aFields) { my ($sLabel, $sValue) = split(/\s*:\s*/, $sField); $hFields{$sLabel} = $sValue; } #If a field is missing from a record, then attempts to pass #it to split, length, or printf will result in complaints #about undefined values, so before using anything either #check to see if it is defined and/or assign it a default #value #Note also, we check for "unless defined($hFields{Missed})" #and not just "unless $hFields{Missed}" #unless $var would also set $var=$MISSING if $var==0. $hFields{Protein}=$MISSING unless defined($hFields{Protein}); $hFields{Peptide}=$MISSING unless defined($hFields{Peptide}); $hFields{Missed}=$MISSING unless defined($hFields{Missed}); #To split a string into characters, use // not / /. #Also if you are calculating mass, why not use mass as a #variable name: it is much more self documenting #my $total = 0.0; #my @peptides = split (/ /, $sequence);#65 my $mass=0.0; if (defined($hFields{Seq})) { my @peptides = split(//, $hFields{Seq}); foreach my $peptide (@peptides) { $mass += ($s2m{$peptide}+18.0106) if defined $s2m{$peptide}; } } else { $hFields{Seq} = $MISSING; } printf $OUTPUT_FMT, $hFields{Protein}, $hFields{Peptide} , $mass, '1', $hFields{Missed}, $hFields{Seq}; }
Best, beth
In reply to Re: perl task3
by ELISHEVA
in thread perl task3
by huzefa
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |