All that is required is a very slight "tweak" to the $name regex statement (if name ends in _refX, then that part is deleted, otherwise not).
I suppose you are new to Perl. Note that one of the true "Powerhitter" feature of Perl is the total absence of indices (no [$i] stuff). The only "if" statement is pretty much optional provided that you have good data file to work with as it only skips completely blank lines. Play with the code. You will also notice that the order of the files doesn't matter (no special case for the first file).
Have fun and happy Perling!
#!/usr/bin/perl -w
use strict;
my @files = qw(file1.dat file2.dat file3.dat);
my %hash;
foreach my $file (@files)
{
open (FILE, "<", $file) || die "can't open $file $!";
while (<FILE>)
{
next if /^\s*$/; #simply skips blank lines
chomp;
my ($name,@tokens) = split(/,/,$_);
$name =~ s/_ref\d+$//;
push @{$hash{$name}},@tokens;
}
}
foreach my $name (sort keys %hash)
{
print "$name @{$hash{$name}}\n";
}
__END__
prints:
aaa_1 4 5 8 9
bbb 2 3 5 10
ccc_1 5 6 6 11
| [reply] [d/l] [select] |
thanks for the help, I have one more questions, how would you process each vaules in hash of array, like I want add 5+8+9 then add to push to aaa_1 4 5 8 9 22
can you just breif me how does it get popualted into single hash
| [reply] |
The structure that I demonstrated is a Hash of Array. A hash can only have one value for each key, BUT that value can be a pointer to another data structure, in this case, an anonymous array. To initialize an @variable, you would normally have something like: my @variable =(4,5,6); That puts 3 things into @variable. If you have @variable =[4,5,6];, that puts ONE thing into @variable, a POINTER an array with 3 things in it. That array with the 3 things has no given name and is called an anonymous array.
All of the things that you can do with an array apply in this situation: simple assignment, push, pop, shift, unshift, etc.. Below I showed some other code with a more explicit looking assignment with the square brackets for the anonymous array. It looks a bit "funny" because this is a assignment of a value to a hash key, but the principle is the same as simple array.
So to sum up all things in an array and push the total onto the array, we do it just like if this was a simple @var! See the code below.
I will point out that when you deference a multi-dimensional thing with a subscript, you need an extra pair of {}, hence @{$hash{'aaa_1'}}.
hope this answers your question...
#!/usr/bin/perl -w
use strict;
my %hash;
$hash{'aaa_1'}=[9, 8,7,6];
print "aaa_1 is : @{$hash{'aaa_1'}}\n";
#prints aaa_1 is : 9 8 7 6
my $array_ref = $hash{'aaa_1'}; #another way...
print "@$array_ref\n"; #de-references array ref into array
#prints 9 8 7 6
foreach my $num (@$array_ref) #another way
{
print "a num is: $num\n";
}
#prints this...
#a num is: 9
#a num is: 8
#a num is: 7
#a num is: 6
#################
$hash{'aaa_1'}=[]; #clears anon array
push @{$hash{'aaa_1'}},(4,5,8,9);
print "aaa_1 is : @{$hash{'aaa_1'}}\n";
#prints aaa_1 is : 4 5 8 9
###############
my $sum=0;
foreach my $num (@{$hash{'aaa_1'}})
{
$sum+=$num;
}
push @{$hash{'aaa_1'}}, $sum;
print "aaa_1 is now : @{$hash{'aaa_1'}}\n";
#prints aaa_1 is now : 4 5 8 9 26
| [reply] [d/l] [select] |
thanks it worked for me,I understand the tweaking part,still not clear how is it exactly matching with first field with all the three files. yes I have start learning perl very recently, thanks once again
| [reply] |
The key is in the regex $name =~ s/_ref\d+$//;
This just deletes things at the end of string, like _ref3. If something like that is not there, then nothing happens! As general "rule of thumb", do not create special cases like "for the first file, we do X" otherwise we do "Y" unless needed. Add some print statements in the code to see what it is doing. Run the code with different orders of files (should give same result). Perl Regex "regular expressions" are an integral part of the language and you should master the use of \s\d\w and \S\D\W. You will go very far with them! Especially when used with the "anchors" of ^ and $ which say to start at beginning of var to be tested or back up from the end of var to be tested.
s/_ref\d+$//; means that we start at the end (the $ symbol means that), back up and see if something like _ref followed by one or more digits exists, _ref3, _ref34, etc. If it does, then it is deleted. \d means exactly one digit, \d+ means one or more digits in a row, \d* means maybe some digit or not (zero or one). The capital version \D means "not a digit", anything except 0-9. That's not used here, but that is what it would mean.
| [reply] [d/l] [select] |