avanta has asked for the wisdom of the Perl Monks concerning the following question:

I have a file which contains data in the format

2010-02-12_aaa,654 2010-02-12_aaa,248 2010-02-12_bbb,374 2010-02-12_ccc,158 2010-02-13_aaa,745 2010-02-13_bbb,786 2010-02-13_bbb,354 2010-02-13_ddd,852 2010-02-14_bbb,754 2010-02-14_aaa,169 2010-02-14_ccc,965 2010-02-14_ccc,756
I need to create unique key the value before comma ','. Thus my output shud be

2010-02-12_aaa,902 2010-02-12_bbb,374 2010-02-12_ccc,158 2010-02-13_aaa,745 2010-02-13_bbb,1140 2010-02-13_ddd,852 2010-02-14_bbb,754 2010-02-14_aaa,169 2010-02-14_ccc,1721
i.e. the portion "2010-02-14_aaa" is key and the value is added up if found repeating key. I thought to try the "unique" module but it will remove the duplicate entries but here i want the aggregated data.

use List::MoreUtils qw(uniq); my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
any suggetion wat i should use.

Thanks
AvantA

Replies are listed 'Best First'.
Re: create unique key in the entries in a file
by FunkyMonk (Bishop) on Feb 13, 2010 at 18:45 UTC
    Use split to separate each line into a key and value. Use a hash to sum over each key.
    my %sum_of; while (<DATA>) { my ($k, $v) = split /,/; $sum_of{$k} += $v; } print "$_,$sum_of{$_}\n" for sort keys %sum_of; __DATA__ 2010-02-12_aaa,654 2010-02-12_aaa,248 2010-02-12_bbb,374 2010-02-12_ccc,158 2010-02-13_aaa,745 2010-02-13_bbb,786 2010-02-13_bbb,354 2010-02-13_ddd,852 2010-02-14_bbb,754 2010-02-14_aaa,169 2010-02-14_ccc,965 2010-02-14_ccc,756

    Output:

    2010-02-12_aaa,902 2010-02-12_bbb,374 2010-02-12_ccc,158 2010-02-13_aaa,745 2010-02-13_bbb,1140 2010-02-13_ddd,852 2010-02-14_aaa,169 2010-02-14_bbb,754 2010-02-14_ccc,1721


    Unless I state otherwise, all my code runs with strict and warnings
      I wrote the following code to test the method u gave me.
      use strict; use warnings; my @content; my @files; my $dir; my $input1 = "source/host1/"; my $input2 = "source/host2/"; my $name = "volume_2010_02"; #my $date = "2010-02-13"; opendir DIR1, "$input1"; my @files1= grep {$_ ne '.' && $_ ne '..' } readdir (DIR1); @files = @files1; #print @files,"\ninput file 1\n"; $dir = $input1; #print $dir,"\ninput dir 1\n"; my @content1 = &readfiles(@files,$dir); #print @content1,"\nFILE 1 Content\n\n"; @content =(); opendir DIR2, "$input1"; my @files2= grep {$_ ne '.' && $_ ne '..' } readdir (DIR2); @files = @files2; #print @files,"\ninput file 2\n"; $dir = $input2; #print $dir,"\ninput dir 2\n"; my @content2 = &readfiles(@files,$dir); #print @content2,"\nFILE 2 Content\n\n"; push @content1,@content2; my @file = sort {$a cmp $b} @content1; print @file,"\nPush output\n\n"; my %output; my $print; my $semiprint; foreach(@file) { #print $_,"\nCOntent1\n\n"; my ($key,$val) = split (/\,/,$_); $semiprint .= "$key,$val\n"; #print $semiprint,"\nsemi in loop\n\n"; $output{$key} += $val; $print .="$key,$output{$key}\n"; #print $print,"\nFinal output\n\n"; } #print $print,"\nFinal output\n\n"; #print $semiprint,"\nsemi output\n\n"; open OUTPUT, "> target/1.dat"; print OUTPUT $print; sub readfiles { my @file = @files; my $directory = $dir; foreach(@file) { #print $_,"\nFILE NAME\n"; if($_ =~ /^$name/) { #print $_,"\nFile found \n"; open FILE,"< $dir/$_"; while(<FILE>) { chomp; push @content,$_; } } } return @content; }
      Now as you can see i am reading the same data from two files(the data being same as i gave earlier) in diferrnt directories. So, my final output should have sum of values of all the similar keys in both the input files. Following are the input file and result.
      __CONTENT FILE 1__ 2010-02-12_aaa,654 2010-02-12_bbb,374 2010-02-12_ccc,158 2010-02-13_aaa,745 2010-02-13_bbb,786 2010-02-13_ddd,852 2010-02-14_bbb,754 2010-02-14_aaa,169 2010-02-14_ccc,965

      __CONTENT FILE 2__ 2010-02-12_aaa,654 2010-02-12_bbb,374 2010-02-12_ccc,158 2010-02-13_aaa,745 2010-02-13_bbb,786 2010-02-13_ddd,852 2010-02-14_bbb,754 2010-02-14_aaa,169 2010-02-14_ccc,965

      __OUTPUT__ 2010-02-12_aaa,654 2010-02-12_aaa,1308 2010-02-12_bbb,374 2010-02-12_bbb,748 2010-02-12_ccc,158 2010-02-12_ccc,316 2010-02-13_aaa,745 2010-02-13_aaa,1490 2010-02-13_bbb,786 2010-02-13_bbb,1572 2010-02-13_ddd,852 2010-02-13_ddd,1704 2010-02-14_aaa,169 2010-02-14_aaa,338 2010-02-14_bbb,754 2010-02-14_bbb,1508 2010-02-14_ccc,965 2010-02-14_ccc,1930

      __EXPECTED OUTPUT__ 2010-02-12_aaa,1308 2010-02-12_bbb,748 2010-02-12_ccc,316 2010-02-13_aaa,1490 2010-02-13_bbb,1572 2010-02-13_ddd,1704 2010-02-14_aaa,338 2010-02-14_bbb,1508 2010-02-14_ccc,1930

      Can you tell me how i can omit the old data after aggregation. Or in case of non similar key just print it in the output.

      Thanks
      AvantA
        You can't build your output string until you've processed all the entries in your files. Replace the print just before your subroutine with
        print OUTPUT "$_,$output{$_}\n" for sort keys %output;

        Your file will now contain:

        2010-02-12_aaa,1308 2010-02-12_bbb,748 2010-02-12_ccc,316 2010-02-13_aaa,1490 2010-02-13_bbb,1572 2010-02-13_ddd,1704 2010-02-14_aaa,338 2010-02-14_bbb,1508 2010-02-14_ccc,1930

        Your next task is to tidy up your source code, it's a mess :-)

        Some hints:

        • You use strict and warnings. This is very good.
        • Always check that your opens and opendirs succeeded.
        • Don't call subroutines using &readfiles(...). You don't need the "&" and it doesn't do what you think it does.
        • Pick more meaningful variable names, @files and @files2 are rarely good choices.
        • You don't seem to understand the point of calling subroutines with parameters. See perlsub.
        • Always declare variables in the smallest possible scope. Coping with Scoping may help (its old, but still worth a read).
        • Most importantly, enjoy your Perl programming.
      you simply are genius man!! thanks will try it asap but i guess it will work.. thanks... :)
Re: create unique key in the entries in a file
by youlose (Scribe) on Feb 13, 2010 at 19:01 UTC
    use Modern::Perl;
    use Data::Dump qw(dump);
    
    my %result;
    for (<DATA>) {
    	my ($key,$num) = split /,/;
    	$result{$key} += $num;
    }
    
    say dump \%result;
    
    __DATA__
    2010-02-12_aaa,902
    2010-02-12_bbb,374
    2010-02-12_ccc,158
    2010-02-13_aaa,745
    2010-02-13_bbb,1140
    2010-02-13_ddd,852
    2010-02-14_bbb,754
    2010-02-14_aaa,169
    2010-02-14_ccc,1721
    
    
    Result:
    {
      "2010-02-12_aaa" => 902,
      "2010-02-12_bbb" => 374,
      "2010-02-12_ccc" => 158,
      "2010-02-13_aaa" => 745,
      "2010-02-13_bbb" => 1140,
      "2010-02-13_ddd" => 852,
      "2010-02-14_aaa" => 169,
      "2010-02-14_bbb" => 754,
      "2010-02-14_ccc" => 1721,
    }