in reply to Count the sequence length of each entry in the file

my %prot; $para =~ s/([ACDEFGHIKLMNPQRSTVWY])/ ++$prot{ $1 } /eg; $len = length($para);

You are getting the length after you modify the variable!

Replies are listed 'Best First'.
Re^2: Count the sequence length of each entry in the file
by davi54 (Sexton) on Oct 01, 2020 at 20:27 UTC
    Ohh, I see. I instead placed it like this:
    $para =~ s/^\s*#.*//mg; $len = length($para);
    and got the output. However, the count is incorrect. For, example, if you look at the first entry, after removing the header, the sequence length (alphabets in uppercase) is 111, whereas the script gives me 115 as the output for sequence length. I don't know what I'm doing wrong, why is the script returning me wrong value?
      When you strip off the header the line endings remain. Try deleting "\n" characters. If on windows also delete "\r" characters.
      # Remove fasta header line if ( $para =~ s/^>(.*)//m ){ $name = $1; }; # Remove comment line(s) $para =~ s/^\s*#.*//mg; $para =~ tr/\r\n//d;
        If on windows also delete "\r" characters.

        This is not necessary as the PerlIO :crlf layer is default on Windows and converts CRLF to LF on input. One can disable the translation with binmode or the :raw pseudolayer, but that's not the case in any of the code shown here. See also Newlines in perlport, and note that chomp also handles paragraph mode correctly.