The
my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # please check
statement defines a regex to match a correct sequence. The attached comment reflects the fact, noted elsewhere in the post | in a previous post, that I'm not a bio-monk and am not at all certain of the correct amino or protein (if that's what they are) code letters to use.
Is there a code in the
SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG
sequence that is not included in the
[ACDEFGHIKLMNPQRSTVWY_]
character class that is used to recognize a correct sequence? If there is, is the sequence still correct? If the sequence is correct, should the regex be changed to include the unrecognized code? As I said, I'm not expert on biological topics and I cannot say. Please take a look at the sequence and the regex and, based on your training and experience, decide on the proper course of action.
AFAICT, the regex, as defined, does not match the sequence in question and the sequence is properly (according to the code as it now stands!) rejected as being a bad record.
BTW: The subroutine
still does nothing to write to the output file.sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
Give a man a fish: <%-{-{-{-<
In reply to Re^11: How to write to a file?
by AnomalousMonk
in thread How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence?
by davi54
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |