in reply to Re^10: How to write to a file?
in thread How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence?

The
    my $rx_sequence     = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms;  # please check
statement defines a regex to match a correct sequence. The attached comment reflects the fact, noted elsewhere in the post | in a previous post, that I'm not a bio-monk and am not at all certain of the correct amino or protein (if that's what they are) code letters to use.

Is there a code in the
    SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG
sequence that is not included in the
    [ACDEFGHIKLMNPQRSTVWY_]
character class that is used to recognize a correct sequence? If there is, is the sequence still correct? If the sequence is correct, should the regex be changed to include the unrecognized code? As I said, I'm not expert on biological topics and I cannot say. Please take a look at the sequence and the regex and, based on your training and experience, decide on the proper course of action.

AFAICT, the regex, as defined, does not match the sequence in question and the sequence is properly (according to the code as it now stands!) rejected as being a bad record.

BTW: The subroutine

sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
still does nothing to write to the output file.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^12: How to write to a file?
by davi54 (Sexton) on Oct 16, 2019 at 14:56 UTC
    That is absolutely possible. Some proteins do have rare amino acids such as Selenocysteine which is represented by U, etc. I'll change the regex.

    Thanks for pointing out the subroutine. I was missing the print commands. Following is what I did and it worked:
    sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//; printf $out_file "Number of alanines = $seq_n_A\n\n" ; printf $out_file "sequence length = $seq_len\n\n" ; }