Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi..
I'm new to Perl, I have a input file "Data.txt" in the following format.

Db10g029860.2 7
Db10g029860.2 1
Db10g029860.2 2
Db10g029860.2 1
Db10g029860.2 4
Db10g029860.2 2
Db10g029860.2 6
Db10g029860.2 11
Db96g938791.0 5
Db96g938791.0 9
Db96g938791.0 1
Db96g938791.0 3
Db96g938791.0 7
Db04g787390.1 6
Db04g787390.1 5
Db04g787390.1 5
Db04g787390.1 12
.. etc.

I need to find the total for each unique id's.

Output

Db10g029860.2 34
Db96g938791.0 25
Db04g787390.1 28
.. etc..

Replies are listed 'Best First'.
Re: Count ID values
by Anonymous Monk on Mar 20, 2010 at 06:49 UTC
      use strict; use warnings; use Data::Dumper; open FH,"<file" or die "can't Open $!\n"; my $line; my %hash; while($line=<FH>) { my($value,$value2)=split(' ',$line); $hash{$value}+=$value2; } foreach my $key (keys%hash) { print "$key => $hash{$key}\n" }
        See the following Code:
        open(FH,"file"); foreach $line (<FH>) { if($line=~/^Db/) ### Get only the ID lines. { ($id,$count)=split(' ',$line); $count{$id}+=$count; } } print "ID\t\tCount\n"; foreach $key (keys%count) { print "$key\t$count{$key}\n" }
        --$ugum@r--
      use Data::Dumper; open FH,"DATA.txt" or die $!; my %hash; my ($a,$b); while(<FH>) { ($a,$b)=split(' '); $hash{$a}+=$b; } print Dumper \%hash;

        Please try to provide good sample code for people new to Perl. $a and $b are special variables and are poor choices for variable names in any case. Always use strictures (use strict; use warnings;). Use the three parameter version of open and use lexical file handles. Declare variables in the smallest scope that makes sense. Try to write a self contained example that can be run. Consider:

        use strict; use warnings; my $fileName = 'data.txt'; # Create the sample data file open my $outFile, '>', $fileName or die "Failed to create $fileName: $ +!"; print $outFile <<SAMPLEDATA; Db10g029860.2 7 Db10g029860.2 1 Db10g029860.2 2 Db10g029860.2 1 Db10g029860.2 4 Db10g029860.2 2 Db10g029860.2 6 Db10g029860.2 11 Db96g938791.0 5 Db96g938791.0 9 Db96g938791.0 1 Db96g938791.0 3 Db96g938791.0 7 Db04g787390.1 6 Db04g787390.1 5 Db04g787390.1 5 Db04g787390.1 12 SAMPLEDATA close $outFile; # Process the input file open my $fileIn, '<', $fileName or die "Unable to open $fileName: $!"; my %counts; while(<$fileIn>) { chomp; my ($id, $count) = split /\s+/; next if ! $count; $counts{$id} += $count; } # Print the result for my $id (sort keys %counts) { printf "%-12s: %5d\n", $id, $counts{$id}; }

        Prints:

        Db04g787390.1: 28 Db10g029860.2: 34 Db96g938791.0: 25

        True laziness is hard work