in reply to Re^6: How to write to a file?
in thread How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence?

Assuming that your input file contains only FASTA header records and sequence records (i.e., no blank lines, comment lines, etc.), you might do something like (untested):

my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$read_filehandle>) { chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//;; do_something_with($sequence, $seq_len, $seq_n_A); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is }
(I'm not sure about the codon codes I've used to define $rx_sequence; please double-check this. (Update: Note, also, that the sequence letters I've used are all upper-case, but IIRC these can also be lower-case.))


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^8: How to write to a file?
by davi54 (Sexton) on Oct 15, 2019 at 17:32 UTC
    Following is the script that I tried with open:
    #!/usr/bin/perl use strict; use warnings; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename or die "Cannot open '$prot_filename' because: $!"; my $report_name = $prot_filename.'_alanine_report'; open my $out_file, '>', $report_name or die "Cannot open '$report_name' because: $!"; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//;; do_something_with($sequence, $seq_len, $seq_n_A); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # show all results print $out_file; close $out_file;
    But, it gave me the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42. And the outfile is empty. What am I doing wrong here?

      Your input file seems to contain an empty line, but the script does not provide for it.

      If empty lines are ok, you could define something like
      my $rx_empty = qr{ \A \s* \z }xms; # empty line (or whitespace on +ly)
      and after the chomp
      if ($record =~ $rx_empty) { # ignore next RECORD }

      All my coding suggestions are untested.

      ... the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42.

      (soonix has already addressed this.)

      ... the outfile is empty.

      Almost all processing of a sequence is done in the  do_something_with() subroutine, which you do not show. I assume this subroutine exists in your code because the code would not compile if it did not (because you're enabling strictures with strict). Because I cannot see the code, I cannot say why it produces no output.

      Here's another version of your posted code, including do_something_with(). It's not complete because I still don't know exactly what you want to do with each sequence. I've also done some re-organization, and handled the case of a sequence resulting from a final FASTA record (which I had not considered before). I'm using autodie to simplify file I/O error reporting. Again, this is untested.


      Give a man a fish:  <%-{-{-{-<

        I'm sorry I did not include my subroutines in the previous message. But, I am counting the length of the sequence in the subroutine but it's not working yet. I tried your suggestion as follows:
        #!/usr/bin/perl use strict; use warnings; use autodie; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename; my $report_name = "${prot_filename}_alanine_report"; open my $out_file, '>', $report_name; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $rx_blank_line = qr{ \A \s* \z }xms; my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { # handle any records to be ignored here next RECORD if $record =~ $rx_blank_line; chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; do_something_with($sequence, $out_file); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # end while RECORD loop # handle any sequence accumulated from final FASTA record do_something_with($sequence, $out_file) if defined $sequence; # clean up close $PROTFILE; close $out_file; exit; # subroutines # sub do_something_with { my ($sequence, # accumulated sequence $fh, # output filehandle ) = @_; my $seq_len = length $sequence; }
        But I'm getting the error - bad record: 'SRALGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWVDIKNANLVVVMGGNAAEAHPVG' at ../alanine_vs_sequence_length_3.pl line 47, <$_...> line 1370. And I get this error for both your and my script. Does it have something to do with the "die" command on line 47 as it is pointing out?