in reply to Re^5: How to write to a file?
in thread How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence?

Yes, I did read that. And I am currently trying to implement it. But I am looking to solve the first question first. What can I do to fix that?

Replies are listed 'Best First'.
Re^7: How to write to a file?
by AnomalousMonk (Archbishop) on Oct 12, 2019 at 00:23 UTC

    Assuming that your input file contains only FASTA header records and sequence records (i.e., no blank lines, comment lines, etc.), you might do something like (untested):

    my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$read_filehandle>) { chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//;; do_something_with($sequence, $seq_len, $seq_n_A); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is }
    (I'm not sure about the codon codes I've used to define $rx_sequence; please double-check this. (Update: Note, also, that the sequence letters I've used are all upper-case, but IIRC these can also be lower-case.))


    Give a man a fish:  <%-{-{-{-<

      Following is the script that I tried with open:
      #!/usr/bin/perl use strict; use warnings; print 'Please enter protein sequence filename: '; chomp( my $prot_filename = <STDIN> ); open my $PROTFILE, '<', $prot_filename or die "Cannot open '$prot_filename' because: $!"; my $report_name = $prot_filename.'_alanine_report'; open my $out_file, '>', $report_name or die "Cannot open '$report_name' because: $!"; my $rx_fasta_header = qr{ \A > }xms; # header pattern - very naive my $rx_sequence = qr{ \A [ACDEFGHIKLMNPQRSTVWY_]+ \z }xms; # plea +se check my $sequence; # sequence accumulator - initially undefined RECORD: while (my $record = <$PROTFILE>) { chomp $record; # get rid of newline, if any # fasta header - process any sequence accumulated so far if ($record =~ $rx_fasta_header) { # do not process if no sequence accumulated so far (i.e., firs +t header) next RECORD if not defined $sequence; my $seq_len = length $sequence; my $seq_n_A = $sequence =~ tr/A//;; do_something_with($sequence, $seq_len, $seq_n_A); undef $sequence; # prepare to process next sequence next RECORD; } # part of sequence - accumulate if ($record =~ $rx_sequence) { $sequence .= $record; # accumulate sub-sequences (records) next RECORD; } die "bad record: '$record'"; # don't know what this is } # show all results print $out_file; close $out_file;
      But, it gave me the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42. And the outfile is empty. What am I doing wrong here?

        Your input file seems to contain an empty line, but the script does not provide for it.

        If empty lines are ok, you could define something like
        my $rx_empty = qr{ \A \s* \z }xms; # empty line (or whitespace on +ly)
        and after the chomp
        if ($record =~ $rx_empty) { # ignore next RECORD }

        All my coding suggestions are untested.

        ... the error: bad record: '' at ../alanine_vs_sequence_length_2.pl line 46, <$PROTFILE> line 42.

        (soonix has already addressed this.)

        ... the outfile is empty.

        Almost all processing of a sequence is done in the  do_something_with() subroutine, which you do not show. I assume this subroutine exists in your code because the code would not compile if it did not (because you're enabling strictures with strict). Because I cannot see the code, I cannot say why it produces no output.

        Here's another version of your posted code, including do_something_with(). It's not complete because I still don't know exactly what you want to do with each sequence. I've also done some re-organization, and handled the case of a sequence resulting from a final FASTA record (which I had not considered before). I'm using autodie to simplify file I/O error reporting. Again, this is untested.


        Give a man a fish:  <%-{-{-{-<