Summing up duplicate lines

oko1 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Summing up duplicate lines by choroba (Cardinal) on May 08, 2024 at 19:56 UTC
The specification is still vague. What should the third value be if it's not the same, e.g. `[0, 5, 1]` and `[0, 5, 2]`? Also, in what order should the triplets be printed? The following code uses the order of the first appearance of the first two columns, and uses the third value from the first occurrence. Update: The %seen hash contains the index into the @out array, i.e. the index of the first occurrence of the two values. #!/usr/bin/perl use strict; use feature qw{ say }; use warnings; my %seen; my @out; while (<DATA>) { my @n = /-?[0-9]+/g; if ($n[0] == 0 \|\| $n[1] == 0) { if (exists $seen{"@n[0, 1]"}) { $out[ $seen{"@n[0, 1]"} ][$_] += $n[$_] for 0, 1; } else { push @out, \@n; $seen{"@n[0, 1]"} = $#out; } } else { push @out, \@n; } } say "[@$_]" for @out; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ], [download] Output: `[-1 5 1] [0 10 1] [1 5 1] [3 4 1] [5 1 1] [30 0 1] [0 -45 1] [-23 -64 0] [-5 0 1] [-5 1 1]` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Summing up duplicate lines by oko1 (Deacon) on May 08, 2024 at 20:07 UTC
You're right - I didn't even think to specify this. There's no case in which the third value can be different among repeats, so it's not an issue. As to the order of the triplets, it would be the original one except for the "collapsed" ones. Your solution looks great - thank you, much appreciated! -- I hate storms, but calms undermine my spirits. -- Bernard Moitessier, "The Long Way"	[reply]
Re: Summing up duplicate lines by hv (Prior) on May 08, 2024 at 20:21 UTC
The first rule for any problem like this is: define a signature such that elements which should be seen as the same have the same signature, then use a hash with signatures as keys to group things. I think by "repeated lines" you mean lines in which all three values are the same, so the signature should include all three values: `sub signature { my $arrayref = shift; # the examples imply that the inputs are integers # if not, use a different character to join them return join '.', @$arrayref; }` [download] Once you have a signature, you can use a hash to accumulate any duplicates; in this case we just need a count: my %seen; for my $arrayref (@all_inputs) { if ($arrayref->[0] && $arrayref->[1]) { # not interesting, just pass it through print_result($arrayref); } else { # get a signature, increase the count for that signature by 1 ++$seen{ signature($arrayref) }; } } # now construct combined results for the saved inputs for my $sig (keys %seen) { my $count = $seen{$sig}; my $arrayref = [ split /\./, $sig ]; # keep it easy: multiplying the zero value is a noop $arrayref->[$_] *= $count for (0, 1); print_result($arrayref); } [download] For your example data, this ends up with combined lines `[ 0, 10, 1 ], [ -5, 0, 1 ], [ 30, 0, 1 ], [ 0, -45, 1 ]`.	[reply] [d/l] [select]
Re^2: Summing up duplicate lines by oko1 (Deacon) on May 08, 2024 at 20:56 UTC
I'm afraid that didn't work - your output (once I created a 'sub print_result', that is :) ) looked like this: `[ -1, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ -23, -64, 0 ], [ -5, 1, 1 ], [ 30, 0, 1 ], [ -5, 0, 1 ], [ 0, -45, 1 ], [ 0, 10, 1 ]` [download] which skipped some lines. For comparison, choroba's was `[-1 5 1] [0 10 1] [1 5 1] [3 4 1] [5 1 1] [30 0 1] [0 -45 1] [-23 -64 0] [-5 0 1] [-5 1 1]` [download] which was correct. -- I hate storms, but calms undermine my spirits. -- Bernard Moitessier, "The Long Way"	[reply] [d/l] [select]
Re^3: Summing up duplicate lines by GrandFather (Saint) on May 08, 2024 at 22:23 UTC
Actually, in terms of output content both are identical. The difference is that the two solutions give a different line order. Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re^4: Summing up duplicate lines by oko1 (Deacon) on May 09, 2024 at 21:27 UTC
Re^5: Summing up duplicate lines by hippo (Archbishop) on May 09, 2024 at 22:27 UTC
Re: Summing up duplicate lines by tybalt89 (Monsignor) on May 08, 2024 at 22:56 UTC
TIMTOWTDI #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11159342 use warnings; use List::Util qw( sum ); local $_ = join '', <DATA>; s/(?:\[ 0, \S+(,.\n)){2,}/ my $last = $1; '[ 0, ' . (sum $& =~ m~, (\S+),~g) . $last/ge; s/(?:\[ .(, 0, .*\n)){2,}/ my $last = $1; '[ ' . (sum $& =~ m~\[ (\S+),~g) . $last/ge; print; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ], [download] which outputs: `[ -1, 5, 1 ], [ 0, 10, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 30, 0, 1 ], [ 0, -45, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],` [download]	[reply] [d/l] [select]
Re: Summing up duplicate lines by tybalt89 (Monsignor) on May 10, 2024 at 15:13 UTC
And more TIMTOWTDI. It's only three lines of code if you don't count the 'use's. #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11159342 use warnings; my @collapsed = eval <DATA>; $_->[0] == 0 && $collapsed[-1][0] == 0 ? ($collapsed[-1][1] += $_->[1] +) : $_->[1] == 0 && $collapsed[-1][1] == 0 ? ($collapsed[-1][0] += $_- +>[0]) : push @collapsed, $_ for map eval, <DATA>; printf "[ %d, %d, %d ],\n", @$_ for @collapsed; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ], [download] which outputs: `[ -1, 5, 1 ], [ 0, 10, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 30, 0, 1 ], [ 0, -45, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],` [download]	[reply] [d/l] [select]
Re: Summing up duplicate lines by LanX (Saint) on May 09, 2024 at 10:35 UTC
Your description leaves too much room for interpretation. I assume you just want to "collapse" consecutive lines and you don't care about "repetitions" which are separated. Hence no %seen hash is needed, just remember the $last line aka array and compare it to the $current one. If the $current one fits your criteria add it to $last, if it doesn't print $last and make $last=$current for the next iteration. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]
Re^2: Summing up duplicate lines by LanX (Saint) on May 09, 2024 at 12:52 UTC
Here we go, implementing it revealed more ambiguities use v5.12; use warnings; use Data::Dump; my @in = do { local $/; eval <DATA> }; sub DBG { ddx @_ if 0 } # debug DBG "INPUT"=> @in; my @out; my $similarity = 3; # number equal elements my $last = shift @in; my $sum = [ @$last ]; # init while ( my $cur = shift @in ) { my @zeros = grep { $cur->[$_] == 0 } 0..1; my @same = grep { $cur->[$_] == $last->[$_] } 0..$similarity-1; if ( @zeros and @same == $similarity ) { my $non_zero = 1 - $zeros[0]; DBG "SUM" => $last, $cur, \@zeros, $non_zero ; $sum->[$non_zero] += $cur->[$non_zero]; } else { DBG "OUT" => $last, $cur, \@zeros; push @out, $sum; $last = $cur; $sum = [ @$last ]; # init } } push @out, $sum; dd @out; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], # separated repetitions # [ 0, 5, 1 ], # [ 0, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], # Testcases for ambiguities #[ 1000,0,1], #[ 5, 0, 1000 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ], [download] `( [-1, 5, 1], [0, 10, 1], [1, 5, 1], [3, 4, 1], [5, 1, 1], [30, 0, 1], [0, -45, 1], [-23, -64, 0], [-5, 0, 1], [-5, 1, 1], )` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} update fixed final `push @out, $sum;`	[reply] [d/l] [select]
Re^2: Summing up duplicate lines by oko1 (Deacon) on May 09, 2024 at 22:06 UTC
Heh, perhaps that's something else that was under-specified in my question. :) I understand the idea behind it - it's not a complex one - but my brain was just refusing to produce the code to match it. It was just a weird day when I was unable to focus (for Various But Definite Reasons); any programmer who's been around a while has seen this pattern many times. I certainly appreciate the help you folks provided, though! -- I hate storms, but calms undermine my spirits. -- Bernard Moitessier, "The Long Way"	[reply]
Re^3: Summing up duplicate lines by LanX (Saint) on May 10, 2024 at 12:53 UTC
Actually, this collapsing/ folding was harder to implement than I thought. Too many off-by-one edge cases. Still looking into simplifying the code. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]

update