in reply to Re^8: How to write to a file?
in thread How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence?

All my coding suggestions are untested.

... the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42.

(soonix has already addressed this.)

... the outfile is empty.

Almost all processing of a sequence is done in the  do_something_with() subroutine, which you do not show. I assume this subroutine exists in your code because the code would not compile if it did not (because you're enabling strictures with strict). Because I cannot see the code, I cannot say why it produces no output.

Here's another version of your posted code, including do_something_with(). It's not complete because I still don't know exactly what you want to do with each sequence. I've also done some re-organization, and handled the case of a sequence resulting from a final FASTA record (which I had not considered before). I'm using autodie to simplify file I/O error reporting. Again, this is untested.

use strict; use warnings; use autodie; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename; my $report_name = "${prot_filename}_alanine_report"; open my $out_file, '>', $report_name; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $rx_blank_line = qr{ \A \s* \z }xms; my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { # handle any records to be ignored here next RECORD if $record =~ $rx_blank_line; chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; do_something_with($sequence, $out_file); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # end while RECORD loop # handle any sequence accumulated from final FASTA record do_something_with($sequence, $out_file) if defined $sequence; # clean up close $PROTFILE; close $out_file; exit; # subroutines ################################################ sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//; ... # further processing print $fh ...; # output final result of processing }


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^10: How to write to a file?
by davi54 (Sexton) on Oct 15, 2019 at 21:38 UTC
    I'm sorry I did not include my subroutines in the previous message. But, I am counting the length of the sequence in the subroutine but it's not working yet. I tried your suggestion as follows:
    #!/usr/bin/perl use strict; use warnings; use autodie; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename; my $report_name = "${prot_filename}_alanine_report"; open my $out_file, '>', $report_name; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $rx_blank_line = qr{ \A \s* \z }xms; my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { # handle any records to be ignored here next RECORD if $record =~ $rx_blank_line; chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; do_something_with($sequence, $out_file); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # end while RECORD loop # handle any sequence accumulated from final FASTA record do_something_with($sequence, $out_file) if defined $sequence; # clean up close $PROTFILE; close $out_file; exit; # subroutines # sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
    But I'm getting the error - bad record: 'SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG' at ../alanine_vs_sequence_length_3.pl line 47, <$_...> line 1370. And I get this error for both your and my script. Does it have something to do with the "die" command on line 47 as it is pointing out?

      The
          my $rx_sequence     = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms;  # please check
      statement defines a regex to match a correct sequence. The attached comment reflects the fact, noted elsewhere in the post | in a previous post, that I'm not a bio-monk and am not at all certain of the correct amino or protein (if that's what they are) code letters to use.

      Is there a code in the
          SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG
      sequence that is not included in the
          [ACDEFGHIKLMNPQRSTVWY_]
      character class that is used to recognize a correct sequence? If there is, is the sequence still correct? If the sequence is correct, should the regex be changed to include the unrecognized code? As I said, I'm not expert on biological topics and I cannot say. Please take a look at the sequence and the regex and, based on your training and experience, decide on the proper course of action.

      AFAICT, the regex, as defined, does not match the sequence in question and the sequence is properly (according to the code as it now stands!) rejected as being a bad record.

      BTW: The subroutine

      sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
      still does nothing to write to the output file.


      Give a man a fish:  <%-{-{-{-<

        That is absolutely possible. Some proteins do have rare amino acids such as Selenocysteine which is represented by U, etc. I'll change the regex.

        Thanks for pointing out the subroutine. I was missing the print commands. Following is what I did and it worked:
        sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//; printf $out_file "Number of alanines = $seq_n_A\n\n" ; printf $out_file "sequence length = $seq_len\n\n" ; }