in reply to Re^9: How to write to a file?
in thread How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence?

I'm sorry I did not include my subroutines in the previous message. But, I am counting the length of the sequence in the subroutine but it's not working yet. I tried your suggestion as follows:
#!/usr/bin/perl use strict; use warnings; use autodie; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename; my $report_name = "${prot_filename}_alanine_report"; open my $out_file, '>', $report_name; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $rx_blank_line = qr{ \A \s* \z }xms; my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { # handle any records to be ignored here next RECORD if $record =~ $rx_blank_line; chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; do_something_with($sequence, $out_file); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # end while RECORD loop # handle any sequence accumulated from final FASTA record do_something_with($sequence, $out_file) if defined $sequence; # clean up close $PROTFILE; close $out_file; exit; # subroutines # sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
But I'm getting the error - bad record: 'SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG' at ../alanine_vs_sequence_length_3.pl line 47, <$_...> line 1370. And I get this error for both your and my script. Does it have something to do with the "die" command on line 47 as it is pointing out?

Replies are listed 'Best First'.
Re^11: How to write to a file?
by AnomalousMonk (Archbishop) on Oct 15, 2019 at 22:40 UTC

    The
        my $rx_sequence     = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms;  # please check
    statement defines a regex to match a correct sequence. The attached comment reflects the fact, noted elsewhere in the post | in a previous post, that I'm not a bio-monk and am not at all certain of the correct amino or protein (if that's what they are) code letters to use.

    Is there a code in the
        SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG
    sequence that is not included in the
        [ACDEFGHIKLMNPQRSTVWY_]
    character class that is used to recognize a correct sequence? If there is, is the sequence still correct? If the sequence is correct, should the regex be changed to include the unrecognized code? As I said, I'm not expert on biological topics and I cannot say. Please take a look at the sequence and the regex and, based on your training and experience, decide on the proper course of action.

    AFAICT, the regex, as defined, does not match the sequence in question and the sequence is properly (according to the code as it now stands!) rejected as being a bad record.

    BTW: The subroutine

    sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
    still does nothing to write to the output file.


    Give a man a fish:  <%-{-{-{-<

      That is absolutely possible. Some proteins do have rare amino acids such as Selenocysteine which is represented by U, etc. I'll change the regex.

      Thanks for pointing out the subroutine. I was missing the print commands. Following is what I did and it worked:
      sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//; printf $out_file "Number of alanines = $seq_n_A\n\n" ; printf $out_file "sequence length = $seq_len\n\n" ; }