imlou has asked for the wisdom of the Perl Monks concerning the following question:

If i have a file that is full of letters with no spaces, only newline chars. How would I be able to count the occurances of certain letters in the file. example: (file contains) FDIELSIGICOXLSAGICK\n and i'm looking for the occurances of the letters "F, I, L".

Replies are listed 'Best First'.
Re: counting occurances
by davido (Cardinal) on Sep 19, 2003 at 19:17 UTC
    Try this if all you want is a combined total of all the things you're searching for A much more flexible solution is posted in a subsequent followup:

    my $count; $count += tr/FIL/FIL/ while <DATA>; print "F, I, and L occurred $count times.\n"; __DATA__ FDIELSIGCOXLSAGICK\n

    Update:

    This approach will give you a combined total count for all the items you're counting. If you want an individual count for each thing you're counting, with the ability to easily add or subtract items from the count list, see my followup post. Therein you will find two examples that both work well. The first (which I prefer) uses tr///, and the second uses index. I think you'll find those solutions to be much closer to what you need.

    Hope this helps!

    Dave

    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

      would that give me the occurance of them individually? Or when they are together? I also have to remove the first line in the file because it's header information and I don't need it. I have this so far but its doesn't seem to work.

      while(<FILE>){ (s!>.*(\n)!!); my @words = (A, C, G, T); #letters to search my @file = split (/\w/g, <FILE>); #split each letter into an array foreach $file (@file){ foreach $words (@words){ if ($file eq $words){ $word_list{words}++; } } } @pairs = sort {$a->[1] <=> $b->[1]} map {[ $_ => $word_list{$_}]} keys %word_list; print "word $_ ->[0] = $_->[1]\n" for @pairs;
      It's been a while since I've used perl, so i'm very very rusty. Thanks
        If you want to count the occurrence of each item independantly, and you want to strip the first line off as a header line, and you want the list of items that you're counting to be easily adjustable, this will do the trick.

        my $header_line = <DATA>; my %count; my @chars = ( qw/F G S/ ); while (my $line = <DATA> ) { eval "\$count{$_} += \$line =~ tr/$_/$_/,1" or die $@ foreach @chars; } print "There are $count{$_} occurrences of $_\n" foreach sort keys %count; __DATA__ Sample header hine FDIELSIGCOXLSAGICK\n FDIELSIGCOXLSAGICK\n

        The reason that the tr/// must appear inside of an eval block is that variables are not interpolated in tr/// (the transliteration table is built at compiletime, not runtime). Eval forces a fresh compilation of tr/// each time through the loop.

        The reason that I pass references is because I want the variables to exist as variables inside the eval, not as values (except in the case of what's inside the tr/// itself).

        And the '1' appears at the end of the eval expression so that eval returns safely (without croaking) even if no matches are found.

        I think this is an elegant solution, and saves a lot of intricate fiddling.

        If you want to see a solution that uses index instead of tr///, you may...

        Dave

        "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

        Yet another version, trying for speed like davido's using   tr   , but without losing the speed to eval's just to force   tr   to accept the variables.   If this still doesn't go fast enough you'll have to unroll into 4 separate m/// statements.
        my $temp = <DATA>; print "Discarding the file header line.\n"; print " ", $temp; my @chars = qw( A C G T ); my %cnts; while( <DATA> ) { foreach my $char (@chars) { # my $cnt = () = m/$char/g; # $cnts{$char} += $cnt; # printf " For char '%s' I found %d\n", $char, $cnt; $cnts{$char} += () = m/$char/g; } } foreach my $char (@chars) { printf " Char '%s': %6d\n", $char, $cnts{$char}; } __DATA__ Generated by a completely confused program yesterday ACGTGACTAGAGGCCCGGGGAAAAAAAAAACCCCCCC ACCTGACTAGAGGCCCGGGGAAAAAAAAAACCCCCCC ACGTGACTAGAGGCCCGGGGAAAAAAAAAACCCCCCC AGGTGAGTAGAGGGGGGGGGAAAAAAAAAAGGGGGGG ACGTGACTAGAGGCCCGGGGAAAAAAAAAACCCCCCC
        Outputs
        Discarding the file header line.
            Generated by a completely confused program yesterday
          Char 'A':      70
          Char 'C':      49
          Char 'G':      56
          Char 'T':      10
        
        Hi there!

        Just a few mistakes:

        <FILE>; # ignore first line my @words = (A, C, G, T); #letters to search my @file = do {local $/; split (//, <FILE>); }; #split each letter int +o an array foreach $file (@file){ foreach $words (@words){ if ($file eq $words){ $word_list{$file}++; } } } @pairs = sort {$a->[1] <=> $b->[1]} map {[ $_ => $word_list{$_}]} keys %word_list; print "word $_->[0] = $_->[1]\n" for @pairs;


        You were only splitting on the first line of the file, and there was a typo on the last line. The code above will hopefully work as you intended.