I wrote the following code to test the method u gave me.
use strict;
use warnings;
my @content;
my @files;
my $dir;
my $input1 = "source/host1/";
my $input2 = "source/host2/";
my $name = "volume_2010_02";
#my $date = "2010-02-13";
opendir DIR1, "$input1";
my @files1= grep {$_ ne '.' && $_ ne '..' } readdir (DIR1);
@files = @files1;
#print @files,"\ninput file 1\n";
$dir = $input1;
#print $dir,"\ninput dir 1\n";
my @content1 = &readfiles(@files,$dir);
#print @content1,"\nFILE 1 Content\n\n";
@content =();
opendir DIR2, "$input1";
my @files2= grep {$_ ne '.' && $_ ne '..' } readdir (DIR2);
@files = @files2;
#print @files,"\ninput file 2\n";
$dir = $input2;
#print $dir,"\ninput dir 2\n";
my @content2 = &readfiles(@files,$dir);
#print @content2,"\nFILE 2 Content\n\n";
push @content1,@content2;
my @file = sort {$a cmp $b} @content1;
print @file,"\nPush output\n\n";
my %output;
my $print;
my $semiprint;
foreach(@file)
{
#print $_,"\nCOntent1\n\n";
my ($key,$val) = split (/\,/,$_);
$semiprint .= "$key,$val\n";
#print $semiprint,"\nsemi in loop\n\n";
$output{$key} += $val;
$print .="$key,$output{$key}\n";
#print $print,"\nFinal output\n\n";
}
#print $print,"\nFinal output\n\n";
#print $semiprint,"\nsemi output\n\n";
open OUTPUT, "> target/1.dat";
print OUTPUT $print;
sub readfiles
{
my @file = @files;
my $directory = $dir;
foreach(@file)
{
#print $_,"\nFILE NAME\n";
if($_ =~ /^$name/)
{
#print $_,"\nFile found \n";
open FILE,"< $dir/$_";
while(<FILE>)
{
chomp;
push @content,$_;
}
}
}
return @content;
}
Now as you can see i am reading the same data from two files(the data being same as i gave earlier) in diferrnt directories. So, my final output should have sum of values of all the similar keys in both the input files. Following are the input file and result.
__CONTENT FILE 1__
2010-02-12_aaa,654
2010-02-12_bbb,374
2010-02-12_ccc,158
2010-02-13_aaa,745
2010-02-13_bbb,786
2010-02-13_ddd,852
2010-02-14_bbb,754
2010-02-14_aaa,169
2010-02-14_ccc,965
__CONTENT FILE 2__
2010-02-12_aaa,654
2010-02-12_bbb,374
2010-02-12_ccc,158
2010-02-13_aaa,745
2010-02-13_bbb,786
2010-02-13_ddd,852
2010-02-14_bbb,754
2010-02-14_aaa,169
2010-02-14_ccc,965
__OUTPUT__
2010-02-12_aaa,654
2010-02-12_aaa,1308
2010-02-12_bbb,374
2010-02-12_bbb,748
2010-02-12_ccc,158
2010-02-12_ccc,316
2010-02-13_aaa,745
2010-02-13_aaa,1490
2010-02-13_bbb,786
2010-02-13_bbb,1572
2010-02-13_ddd,852
2010-02-13_ddd,1704
2010-02-14_aaa,169
2010-02-14_aaa,338
2010-02-14_bbb,754
2010-02-14_bbb,1508
2010-02-14_ccc,965
2010-02-14_ccc,1930
__EXPECTED OUTPUT__
2010-02-12_aaa,1308
2010-02-12_bbb,748
2010-02-12_ccc,316
2010-02-13_aaa,1490
2010-02-13_bbb,1572
2010-02-13_ddd,1704
2010-02-14_aaa,338
2010-02-14_bbb,1508
2010-02-14_ccc,1930
Can you tell me how i can omit the old data after aggregation. Or in case of non similar key just print it in the output.
Thanks AvantA |