in reply to Count the sequence length of each entry in the file

G'day davi54,

"I want to count the number of alphabets in each entry, length of each entry, etc."

That's confusing as you combine the counts for all entries in @count. Other variables, outside the while loop, appear misplaced as they look like they're associated with individual entries: I'd expect them to be inside the while loop.

The following code collects all the data that I believe you want. You can combine values for all entries if necessary.

#!/usr/bin/env perl use strict; use warnings; my %results; { local $/ = ''; while (my $record = <DATA>) { $record =~ s/\A>(.+?)$//m; my $entry = $1; $record =~ s/\s//gm; $results{$entry}{len} = length $record; for (0 .. $results{$entry}{len} - 1) { ++$results{$entry}{count}{substr $record, $_, 1}; } } } use Data::Dump; dd \%results; __DATA__ >sp_0005 VQLQESGGGLVQAGGSLRLSCAASGRAVSMYNMGWFRQAPGQERELVAAISRGGSIYYA DSVKGRFTISRDNAKNTLYLQMNNLKPEDTGVYQCRQGSTLGQGTQVTVSS >sp_0017 HVQLVESGGGSVQAGGSLRLTCAASGFTFSNYYMSWVRQAPGKGLEWVSSIYSVGSNGYY ADSVKGRSTISRDNAKNTLYLQMNSLKPEDTAVYYCAAEPGGSWWDAYSYWGQGTQVTVS S

Extract of output:

{ sp_0005 => { count => { A => 9, C => 2, ..., W => 1, Y => 5, }, len => 110, }, sp_0017 => { count => { A => 10, C => 2, ..., W => 5, Y => 10, }, len => 121, }, }

Notes:

— Ken

Replies are listed 'Best First'.
Re^2: Count the sequence length of each entry in the file
by davi54 (Sexton) on Oct 02, 2020 at 16:54 UTC
    Thank you for your help. Actually, the input file is formatted to have only 60 characters in each line and then it moves to the new line. So, that's the input file format when you see just a single S on the last line of the second entry.

    On a different note, for the first entry, the sequence length value you get in your output (110) is one less than the actual sequence length which is 111. However, my output gives me a sequence length of 115, which is even worse. Do you know where the error might be?

      "for the first entry, the sequence length value you get in your output (110) is one less than the actual sequence length which is 111."
      $ perl -E 'say length "VQLQESGGGLVQAGGSLRLSCAASGRAVSMYNMGWFRQAPGQERELV +AAISRGGSIYYA"' 59 $ perl -E 'say length "DSVKGRFTISRDNAKNTLYLQMNNLKPEDTGVYQCRQGSTLGQGTQV +TVSS"' 51 $ perl -E 'say 59+51' 110

      If you add the newline between those two strings you'll get 111 for \n or 112 for \r\n. There's also whitespace after those strings which will further increase the length of the line. As I already stated, you posted your data as paragraph text: I can't tell what the original data was.

      I removed all whitespace in my code:

      $record =~ s/\s//gm;

      The correct length, after removing white space, is 110.

      — Ken