Re: Merging partially duplicate lines -- oneliner deparsed

..or with a oneliner, that is a bit complicated in the END block but not so uneasy to read (deparsed)

perl  -F"\s+" -ane "push @{$r{join (' 'x8,@F[0..3]) }}, [@F[4,5]]; END
+{foreach $k(keys %r){my($x,$y);map {$x+=$$_[0];$y+=$$_[1]} @{$r{$k}};
+print qq($k\t),($x/scalar @{$r{$k}}),qq(\t$y\n)}}"  uno.txt due.txt

I        33        C        C   0.75    4
I        21        B        A   1       12
I        40        D        D   1       7
I        56        A        E   1       2
I        9        A        B    0.275   14
[download]

which deparsed becomes


perl  -MO=Deparse -F"\s+" -ane "push @{$r{join (' 'x8,@F[0..3]) }}, [@
+F[4,5]]; END{foreach $k(keys %r){my($x,$y);map {$x+=$$_[0];$y+=$$_[1]
+} @{$r{$k}};print qq($k\t),($x/scalar @{$r{$k}}),qq(\t$y\n)}}"  uno.t
+xt due.txt



LINE: while (defined($_ = <ARGV>)) {
    our(@F) = split(/\s+/, $_, 0);
    push @{$r{join ' ' x 8, @F[0..3]};}, [@F[4, 5]];
    sub END {
        foreach $k (keys %r) {
            my($x, $y);
            map {$x += $$_[0];
            $y += $$_[1];} @{$r{$k};};
            print "$k\t", $x / scalar(@{$r{$k};}), "\t$y\n";
        }
    }
    ;
}
-e syntax OK
[download]

PS removed the unused Data::Dump

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Comment on Re: Merging partially duplicate lines -- oneliner deparsed Select or Download Code

Replies are listed 'Best First'.
Re^2: Merging partially duplicate lines -- oneliner deparsed by K_Edw (Beadle) on Jan 30, 2016 at 21:38 UTC
Thanks. This works well although it is quite difficult for me to read. For example - how and where in the script is the averaging of column 4 taking place?	[reply]
Re^3: Merging partially duplicate lines -- oneliner explained by Discipulus (Canon) on Jan 30, 2016 at 22:29 UTC
You are welcome K_Edw and.. sorry i was in hurry before dinner.. The ~~earth~~ heart of the code is the creation of the needed datastructure with `push @{$r{join (' 'x8,@F[0..3]) }}, [@F[4,5]];'` [download] we create the key of the hash `%r` as stringyfied join of fields 0..3 of the autosplitted `@F` array (see `-F"\s+" -a` in perlrun). This give us the uniqueness of the first four fields, used as a key. The value of that key is treated as an array and in this array is pushed another array, anonymous `[@F[4, 5]]` containing last two fields. One array is pushed every times the key is found again over files read. Using Data::Dump `dd` function as first thing in the `END` block you'll see the datastructure: `( "I 33 C C", [[0.5, 2], [1, 2]], "I 21 B A", [[1, 6], [1, 6]], "I 40 D D", [[1, 2], [1, 5]], "I 56 A E", [[1, 2]], "I 9 A B", [[0.25, 6], ["0.30", 8]], )` [download] When all files are processed the `END` block comes in play: for each key of the `%r` hash we use `map` to process all arrays contained as values of the key: every first value is added to `$x` (these are coming from all `$F[4]` values! ) and every second value is added to `$y` (coming from all `$F[5]` values) Vars `$x` and `$y` are declared with `my` so they are resetted for every key of the `%r` hash processed. Now that all is ready and while we are still processing the key of the `%r` hash we print the key, a tab, `$x` divided by how many values we used ( `scalar @{$r{$k}}` ie: the scalar value of the array contained in the `$r{$k}` ) or the average you asked for. Then the total value of `$y` and stop. L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^4: Merging partially duplicate lines -- oneliner explained by K_Edw (Beadle) on Jan 31, 2016 at 10:53 UTC
Thank you very much for the explanation! That makes a lot more sense now.	[reply]